How to capture NCCL communication ops in FakeTensorMode?

Hi team, I’m tracing 2D parallel graph with FakeTensor and FakeTensorMode, but it throws the below error

Exception has occurred: UnsupportedOperatorException
c10d.allreduce_.default
  File "/usr/local/conda/lib/python3.9/site-packages/torch/_subclasses/fake_tensor.py", line 1404, in dispatch
    r = func(*args, **kwargs)
  File "/usr/local/conda/lib/python3.9/site-packages/torch/_ops.py", line 437, in __call__
    return self._op(*args, **kwargs or {})
NotImplementedError: Could not run 'c10d::allreduce_' with arguments from the 'Meta' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'c10d::allreduce_' is only available for these backends: [CPU, CUDA, PrivateUse1, SparseCPU, SparseCUDA, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

So, are there any solutions that can escape this error? Thanks!

and the stack is:

During handling of the above exception, another exception occurred:

  File "/usr/local/conda/lib/python3.9/site-packages/torch/_subclasses/fake_tensor.py", line 1408, in dispatch
    raise UnsupportedOperatorException(func)

The PyTorch version is 2.1.0.

You can provide a meta device implementation of that op yourself from python, that will get you pass this step.

Be aware that c10d ops cannot be used with dynamo as they use dynamically created torchbinding objects that will be baked as constants on your trace.

Hi , Here is the Meta version of allreduce_ that I implemented, but there is a problem with the return value Work. Is there any way to solve it?