Example inputs to compilers are now fake tensors

(Editor’s note: I meant to send this in December, but forgot. Here you go, later than it should have been!)

The merged PR at Use dynamo fake tensor mode in aot_autograd, move aot_autograd compilation to lowering time [Merger of 89672 and 89773] by voznesenskym · Pull Request #90039 · pytorch/pytorch · GitHub changes how Dynamo invokes backends: instead of passing real tensors as example inputs, we now pass fake tensors which don’t contain any actual data.

The motivation for this PR is in the dynamic shapes workstream. The essential problem is this: when compiling for dynamic shapes, you don’t want a backend compiler to specialize on the exact sizes the input tensors had. Instead, you want the compiler to parametric over the sizes, and perhaps only peek at the real size occasionally (introducing a guard), when it would really benefit from specializing to a specific size.

There is no way to enforce this if we pass real tensors to backends: the real tensors have all of the real shapes, and don’t say anything about what relationships the sizes have symbolically. However, if we pass fake tensors, we can also replace the concrete sizes with symbolic sizes. If you use those sizes without looking at their concrete sizes, you end up with a compiled result that can work with arbitrary choices of sizes; if you perform boolean tests on the sizes, we automatically introduce guards to limit the validity of the graph to whenever those guards are true.

There is also a minor side benefit to passing fake tensors: with real tensors, it’s easy to believe that to perform static analysis on the graph (e.g., shape propagation), you have to actually run the graph (with real tensor inputs). This is very slow (since you’re running the real tensor compute at compile time) and uses up a lot of memory. Fake tensors encourage you to make use of FakeTensorMode that allows you to run tensor computation, without actually doing any of the real work.

Hopefully, the changes needed to make things work with fake tensors are straightforward. Please ask any questions here if you have any!

2 Likes

hi @ezyang, how do you recommend custom PyTorch ops to better support FakeTensor?

I want to run my model with torch.compile, however the model uses a couple of custom ops, from where I get error like

RuntimeError: Failed running call_function my_custom_ops.my_custom_function(*(FakeTensor(FakeTensor(..., device='meta', size=(1, 109200, 8, 16),
           grad_fn=<ViewBackward0>), cuda:0), FakeTensor(FakeTensor(..., device='meta', size=(3, 2), dtype=torch.int64), cuda:0), FakeTensor(FakeTensor(..., device='meta', size=(3,), dtype=torch.int64), cuda:0), FakeTensor(FakeTensor(..., device='meta', size=(1, 109200, 8, 3, 4, 2),
           grad_fn=<AddBackward0>), cuda:0), FakeTensor(FakeTensor(..., device='meta', size=(1, 109200, 8, 3, 4),
           grad_fn=<ViewBackward0>), cuda:0), 64), **{}):
The tensor has a non-zero number of elements, but its data is not allocated yet. Caffe2 uses a lazy allocation, so you will need to call mutable_data() or raw_mutable_data() to actually allocate memory.
(scroll up for backtrace)

Write a meta implementation for the operator. Check out The dynamic shapes manual - Google Docs

I see, seems like directly registering to meta is only supported for smaller ops? I got this error when trying on this kernel here for example Deformable-DETR/ms_deform_attn_cuda.h at 11169a60c33333af00a4849f1808023eba96a931 · fundamentalvision/Deformable-DETR · GitHub

RuntimeError: We should not register a meta kernel directly to the operator 'ms_deform_attn', because it has a CompositeImplicitAutograd kernel in core. Instead we should let the operator decompose, and ensure that we have meta kernels for the base ops that it decomposes into.

looking through the doc that you sent, am I supposed to decorate it to decompose? In my case something like this?

@torch.ops.deformable_attention_ops.py_impl(DispatchKey.CompositeImplicitAutograd) # the ops name here
def ms_deform_attn(arg1, arg2...):  # the function name here

(it’s probably not correct as I got AttributeError: '_OpNamespace' 'deformable_attention_ops' object has no attribute 'py_impl'. But I also saw the The place to put the operator (torch/_decomp or torch/_refs) section, and not sure if it applies to my own custom ops? that looks like it's referring to ops that PyTorch wants to expose to the public?)