Example inputs to compilers are now fake tensors

ezyang · January 19, 2023, 9:39pm

(Editor’s note: I meant to send this in December, but forgot. Here you go, later than it should have been!)

The merged PR at Use dynamo fake tensor mode in aot_autograd, move aot_autograd compilation to lowering time [Merger of 89672 and 89773] by voznesenskym · Pull Request #90039 · pytorch/pytorch · GitHub changes how Dynamo invokes backends: instead of passing real tensors as example inputs, we now pass fake tensors which don’t contain any actual data.

The motivation for this PR is in the dynamic shapes workstream. The essential problem is this: when compiling for dynamic shapes, you don’t want a backend compiler to specialize on the exact sizes the input tensors had. Instead, you want the compiler to parametric over the sizes, and perhaps only peek at the real size occasionally (introducing a guard), when it would really benefit from specializing to a specific size.

There is no way to enforce this if we pass real tensors to backends: the real tensors have all of the real shapes, and don’t say anything about what relationships the sizes have symbolically. However, if we pass fake tensors, we can also replace the concrete sizes with symbolic sizes. If you use those sizes without looking at their concrete sizes, you end up with a compiled result that can work with arbitrary choices of sizes; if you perform boolean tests on the sizes, we automatically introduce guards to limit the validity of the graph to whenever those guards are true.

There is also a minor side benefit to passing fake tensors: with real tensors, it’s easy to believe that to perform static analysis on the graph (e.g., shape propagation), you have to actually run the graph (with real tensor inputs). This is very slow (since you’re running the real tensor compute at compile time) and uses up a lot of memory. Fake tensors encourage you to make use of FakeTensorMode that allows you to run tensor computation, without actually doing any of the real work.

Hopefully, the changes needed to make things work with fake tensors are straightforward. Please ask any questions here if you have any!

lm1746 · May 24, 2023, 7:52pm

hi @ezyang, how do you recommend custom PyTorch ops to better support FakeTensor?

I want to run my model with torch.compile, however the model uses a couple of custom ops, from where I get error like

RuntimeError: Failed running call_function my_custom_ops.my_custom_function(*(FakeTensor(FakeTensor(..., device='meta', size=(1, 109200, 8, 16),
           grad_fn=<ViewBackward0>), cuda:0), FakeTensor(FakeTensor(..., device='meta', size=(3, 2), dtype=torch.int64), cuda:0), FakeTensor(FakeTensor(..., device='meta', size=(3,), dtype=torch.int64), cuda:0), FakeTensor(FakeTensor(..., device='meta', size=(1, 109200, 8, 3, 4, 2),
           grad_fn=<AddBackward0>), cuda:0), FakeTensor(FakeTensor(..., device='meta', size=(1, 109200, 8, 3, 4),
           grad_fn=<ViewBackward0>), cuda:0), 64), **{}):
The tensor has a non-zero number of elements, but its data is not allocated yet. Caffe2 uses a lazy allocation, so you will need to call mutable_data() or raw_mutable_data() to actually allocate memory.
(scroll up for backtrace)

ezyang · June 3, 2023, 6:50pm

Write a meta implementation for the operator. Check out The dynamic shapes manual - Google Docs

lm1746 · June 5, 2023, 9:21pm

I see, seems like directly registering to meta is only supported for smaller ops? I got this error when trying on this kernel here for example Deformable-DETR/ms_deform_attn_cuda.h at 11169a60c33333af00a4849f1808023eba96a931 · fundamentalvision/Deformable-DETR · GitHub

RuntimeError: We should not register a meta kernel directly to the operator 'ms_deform_attn', because it has a CompositeImplicitAutograd kernel in core. Instead we should let the operator decompose, and ensure that we have meta kernels for the base ops that it decomposes into.

looking through the doc that you sent, am I supposed to decorate it to decompose? In my case something like this?

@torch.ops.deformable_attention_ops.py_impl(DispatchKey.CompositeImplicitAutograd) # the ops name here
def ms_deform_attn(arg1, arg2...):  # the function name here

(it’s probably not correct as I got AttributeError: '_OpNamespace' 'deformable_attention_ops' object has no attribute 'py_impl'. But I also saw the The place to put the operator (torch/_decomp or torch/_refs) section, and not sure if it applies to my own custom ops? that looks like it's referring to ops that PyTorch wants to expose to the public?)

ezyang · June 10, 2023, 5:38pm

For this op, it should “just work” even without doing anything, because for a composite op, the expectation is the inner ops it calls have meta implementations. Is that true for your op? You didn’t link the rest of the decomposition implementation so I can’t tell easily.

lm1746 · June 10, 2023, 9:41pm

this is the op Deformable-DETR/ms_deform_attn_cuda.cu at 11169a60c33333af00a4849f1808023eba96a931 · fundamentalvision/Deformable-DETR · GitHub, it does have a few layers of ops that it’s calling, but after tracing at the end of the day it just uses + and *, although there is no meta implementation for any of the layers I think.

for this kind of nested ops, how should we go about writing their decomposition implementations?

kurtz · June 21, 2023, 11:33am

Hey @ezyang , I notice both the doc you shared above and the examples in torch/_meta_registrations.py are all dealing with ATen ops. Does it mean for other custom kernels (e.g. flashattention), we need to manually write the meta backend implementation ?

ezyang · June 21, 2023, 1:36pm

Yup! Custom kernels use the same registration mechanism as regular kernels, so you can register metas for them the same way.

youkaichao · October 31, 2023, 11:22pm

Update: this post is not valid now. The PR https://github.com/pytorch/pytorch/pull/99320 changed the calling convention to real tensors.

TharinduRusira · February 13, 2024, 12:15am

Thanks @ezyang for pointing to the nicely written manual on dynamic shapes. Is there a similar documentation on at::SymInt → at::IntRef specialization? If not, what’s the best way to learn SymInt specialization?

P.S.: I think “c10/core/SymIntArrayRef.h” and “c10/core/SymInt.h” are good places to start in the absence of a comprehensive document/tutorial.

Topic		Replies	Views
Custom TensorImpl and TorchDynamo hardware-backends	1	584	September 10, 2023
State of PyTorch core: September 2021 edition frontend API	1	9385	September 21, 2021
Embrace tensor subclass as a Python device registration API hardware-backends	5	443	March 28, 2025
Lazy Tensor Core hardware-backends	20	7531	July 12, 2022
How to capture NCCL communication ops in FakeTensorMode? compiler	3	868	August 3, 2023

Example inputs to compilers are now fake tensors

Related topics