How to capture NCCL communication ops in FakeTensorMode?

weberxie · August 1, 2023, 2:06pm

Hi team, I’m tracing 2D parallel graph with FakeTensor and FakeTensorMode, but it throws the below error

Exception has occurred: UnsupportedOperatorException
c10d.allreduce_.default
  File "/usr/local/conda/lib/python3.9/site-packages/torch/_subclasses/fake_tensor.py", line 1404, in dispatch
    r = func(*args, **kwargs)
  File "/usr/local/conda/lib/python3.9/site-packages/torch/_ops.py", line 437, in __call__
    return self._op(*args, **kwargs or {})
NotImplementedError: Could not run 'c10d::allreduce_' with arguments from the 'Meta' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'c10d::allreduce_' is only available for these backends: [CPU, CUDA, PrivateUse1, SparseCPU, SparseCUDA, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

So, are there any solutions that can escape this error? Thanks!

weberxie · August 1, 2023, 2:08pm

and the stack is:

During handling of the above exception, another exception occurred:

  File "/usr/local/conda/lib/python3.9/site-packages/torch/_subclasses/fake_tensor.py", line 1408, in dispatch
    raise UnsupportedOperatorException(func)

The PyTorch version is 2.1.0.

kumpera · August 2, 2023, 5:35pm

You can provide a meta device implementation of that op yourself from python, that will get you pass this step.

Be aware that c10d ops cannot be used with dynamo as they use dynamically created torchbinding objects that will be baked as constants on your trace.

Richie · August 3, 2023, 12:07pm

Hi , Here is the Meta version of allreduce_ that I implemented, but there is a problem with the return value Work. Is there any way to solve it?

Topic		Replies	Views
[GUIDE] Getting C++ custom ops to work with torch.compile compiler	3	1704	January 16, 2024
[RFC] New Python operator registration API	10	1260	January 31, 2024
Faketensor error when add new backend for pytorch hardware-backends	1	581	May 11, 2023
How to prevent ops to be decomposed in custom compile backend compiler	0	82	March 24, 2025
Custom TensorImpl and TorchDynamo hardware-backends	1	561	September 10, 2023

How to capture NCCL communication ops in FakeTensorMode?

Related topics