State of symbolic shapes branch

ezyang · November 13, 2022, 5:14am

State of symbolic shapes branch: Nov 12 edition

The symbolic-shapes branch (PyTorch: Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub ) is a long running branch containing a large number of features and bugfixes related to dynamic shapes support in PyTorch. Previous update: State of symbolic shapes branch - #16 by ezyang

Commit ID at time of writing: 807a62fc61bea26707c3dc09a12bad204e375a95

Executive summary

This was a chaotic week. Meta had layoffs for the first time in its history (this workstream was not directly affected.) We still made progress on this workstream (merge to master, inductor support, operator coverage), but we also discovered more work to do (more dynamo bugs, dynamic shape guard problems, more accuracy failures). Some big work items (dynamo merge to master, input mutation, copy-on-write tensors) have progressed, but are still short of actually landing to master (or even to the branch, as the case may be). Merge to master is also slow going as we often have to first add OpInfo support before we can merge our changes.

Staffing. Nikita Karetnikov from Quansight is pivoting from general PrimTorch work to working on decompositions / meta implementations that the dynamic shapes workstream can specifically benefit from. Welcome Nikita! In other news, Edward is on jury duty next week.
Design. We have shifted the “stride agnostic IR” concept to a more expansive “stride agnostic PyTorch” concept, where we make eager mode PyTorch as whole less sensitive to stride changes. This includes a new design for Copy-on-write tensors for eager PyTorch - Google Docs which aims to eventually make the BC-breaking change to reshape()/contiguous()/etc to have these functions always return contiguous tensors. A PoC PR for the entire design exists Copy on write reshape by ezyang · Pull Request #88774 · pytorch/pytorch · GitHub and fully passes non-trunk CI, but there are some unresolved questions, such as whether or not to more deeply integrate data pointer reference counting into Storage to reduce the overall level of indirection, and whether or not the proposed warning strategy is too loud or not. This pair of proposals was discussed in the most recent Composability meeting; there were no major objections but also a desire to better understand the implications of the change.
Make silent errors noisy. A big reason why our aot_eager pass rate regressed this rate is we turned on more stringent error checking in the branch, to try to transform potential bugs into assertion failures. This week, we: (1) assert sizes/strides of intermediate tensors are consistent between fake and real tensors, (2) assert functional-only graph after lowering (this turns the batch norm problem we observed last week into a hard error; to bypass some of these errors, we disabled AOTAutograd from running on subgraphs with BatchNorm), (3) assert all guards correspond to tensors dynamo knows about (this flushed out a problem with symbolic shapes guards, where dynamo was not tracking enough guards; we fixed one of the problems, but that didn’t hit all of the issues, so we also have a way of suppressing these failures). Unfortunately, while these changes did nail some accuracy problems, we still have new accuracy failures on the latest model runs.
Inductor has problems. The branch now has some quick and dirty hacks which substantially improved the inductor pass rate (+16 working models), but there are still are many bugs that are causing lots of models to fail for similar reasons. The conclusion is that there are some fundamental pieces in the dynamic shapes inductor integration that don’t exist yet (though @Chillee still assures me they’re easy to do.) On the bright side, the uniformity of inductor errors means there probably aren’t that many distinct bugs to fix.
Model training status on symbolic-shapes. (@ezyang) See also Symbolic shapes work items tracker - Google Sheets
- aot_eager, with TDS: 135 out of 163 (-2 WoW) logs csv Note that this run is skipping all subgraphs with batchnorm and with dynamo guard asserts suppressed (in the spreadsheet, this is noted as BN+IA)
- inductor, with TDS: 24 out of 163 (+16 WoW) (logs too long) csv
- Lowlight: jx_nest_base and twins_pcpvt_base are failing with accuracy errors. Interestingly, this pair of models previously was failing with accuracy errors without dynamic shapes; both were fixed by a single PR https://github.com/pytorch/pytorch/pull/85417 . jx_nest_base is minifiable, although the minifier failed with a reduction error on int when I tried running it (I did not try very hard). twins_pcpvt_base was passing on 10/28 and regressed into a timeout in 11/2 (this is after voz’s major dynamo change hit master); jx_nest_base has never passed.
- Highlight: cait_m36_384 and mobilenet_v2_quantized_qat accuracy failures turned into non accuracy failures after we added DebugInterpreter to aot_eager. cait_m36_384 is now passing; no one has had a chance to investigate mobilenet_v2_quantized_qat
OpInfo tests on symbolic-shapes. (@ezyang)
- pytest test/test_proxy_tensor.py -k test_make_fx_symbolic_exhaustive - 388 passed (+33 WoW), 334 failed (-36 WoW), 501 skipped (+2 WoW) logs csv
- pytest test/functorch/test_aotdispatch.py -k test_aot_autograd_symbolic_exhaustive - 255 passed (+15 WoW), 213 failed (-15 WoW), 129 skipped (+2 WoW) logs csv

Previous branch diff: 76 files changed, 2341 insertions(+), 533 deletions(-)
Current branch diff: 82 files changed, 2236 insertions(+), 747 deletions(-)

Notable bugs

Fix buggy unsqueeze_ implementation. I found this error because the unsqueeze OpInfo test was failing (hooray unit tests). I didn’t actually directly debug the bug; I just replaced the code wholesale with a straight port of the C++ code (I think the bug was in how dimension wrapping was implemented, though!)
Fix stride on softmax_backward_data, fixes cait_m36_384 was found via the DebugInterpreter. The interesting thing about this fix was that I had actually made a repro last week, but no one had picked it up and fixed it, and when I went to look at it again the repro no longer actually failed. Fortunately, DebugInterpreter confirmed that it was indeed a problem with softmax_backward_data.
More correct fix for upsample_bilinear2d decompositions is a fixup of a previous PR, where I attempted to register a composite implementation for upsample_bilinear2d.default in the Python dispatcher. I initially tried CompositeImplicitAutograd; this did not work, because this operator has an explicit autograd formula (Autograd key registration), and Autograd is higher precedence than CompositeImplicitAutograd. This is easy to work around if you know the correct semantics, but you might expect Python registrations to “override” their C++ variants; we should look into potential API changes to remove this footgun.
Suppress guards when constructing fake tensors was discovered by chasing down an assert failure from Dynamo when it needed to construct a shape guard involving a symbolic variable that it didn’t know the source of. The problem is that we do non-trivial tensor operations to make, e.g., view fake tensors actually views, but this means the base tensor gets its own set of fresh symbolic variables that dynamo doesn’t currently track. Fixing this in Dynamo is a fairly involved refactor, especially because we’d like some design that makes it hard for Dynamo to forget to track tensors (e.g., tie it to fake tensor conversion). In the meantime, we added a config driven via TORCHDYNAMO_IGNORE_ASSERT to allow dynamo to suppress these errors for now.
call call_method instead of _call_operator_builtin - operator calls in Dynamo don’t properly handle NotImplemented. Brian/Anjali tried to patch over this, but the bugfix was buggy, so Voz yanked it out from the branch again. This problem needs to be solved again properly.
Some infra/QoL commits were about making it easier to debug errors if things fail. For example, if exec fails to compile code, print the code that failed to compile. There are some design choices which produce unreadable debug output; for example Change how sympy guard formatting works modifies symbolic shape guards to print with newlines, instead of being mashed into a giant unholy expression.
Fix call_range and remove weird idiom of fallthorugh for dynamic in f…. This failed in a very strange way, and even after Edward identified what the problem was, he couldn’t figure out how to fix it (and Voz fixed it later.) It would be good to have a better understanding of what the Right™ way to fix these issues in Dynamo are.

What’s new on the branch this week?

Meta/decomp support

SymInt support

Mark squeeze_ as directly supporting SymInt ezyang

Infrastructure

Add is_view to OpOverload ezyang
[HACK] move batch_norm segfault to the overall skip section ezyang
Add a DebugInterpreter, and make aot_eager use it by default. ezyang
Suppress guards when constructing fake tensors ezyang
Symbolic shape floor and sym_sqrt SherlockNoMad

Quality of life

Make hasattr get_dtype assert error slightly more useful ezyang
[QOL] Make suppressing xfails optional with SUPPRESS_XFAILS ezyang
Suppress original traceback ezyang
[QOL] some more test runner scripts ezyang
aot_autograd: add assert for functional-only graph bdhirsh

Dynamo

Inductor

Merge to master retrospective

Meta registrations, Nov 7 edition, part 1 had to be split into small constituent PRs, because CI on the mega PR was failing on a fixed set of seemingly unrelated tests, even after rebase. However, the problem evaporated when everything was landed individually, so it’s not really clear what the moral of the story here is.
reland “fix as_strided_scatter_backward (#87646)” was reverted because it broke some trunk jobs. It looks like this was resolved by adding more xfails/skips: reland "fix as_strided_scatter_backward (#87646)" by bdhirsh · Pull Request #88342 · pytorch/pytorch · GitHub
Some meta function merges are slow because OpInfo coverage is insufficient. There are two common themes: first, sometimes there is an OpInfo for a given operation, but the OpInfo covers a lot of different overloads/dim specialization of the function, and we only implemented one overload in a PR. To easily test, we have to extract out one particular overload from the OpInfo, or go ahead and implement all of the overloads so the mega OpInfo works. (Arguably, this is a bad decision in OpInfo design.) Second, many missing OpInfos relate to backward functions. Hypothetically, these can be tested indirectly via the forward-backward call that test_aotdispatch.py performs, but in practice it seems to be easier to just add the backward OpInfo.

What’s made it to master this week?

What’s coming next

By person:

bdhirsh: fix input mutation for AOTAutograd (same as last week, progress at first draft of input mutation handling for aot autograd by bdhirsh · Pull Request #88817 · pytorch/pytorch · GitHub )
voz: merging the first round of dynamo patches to master; afterwards, aot autograd cache, restore builtins support in dynamo, fix symbolic shape guard construction in dynamo, refactor dynamo storage of dynamic shapes
ezyang: run some more experiments on copy-on-write
Chillee: fix sympy infinite loops, get inductor training working with dynamic shapes
anjali411: land more branch changes to master, run sweeps this week
nkaretnikov: continue working on test_proxy_tensor coverage

Our north star:

All benchmark models are passing aot_eager and inductor training on branch
Fallback implementation for custom operators without symbolic shape propagation, inferred by running fallback on real operators
All OpInfo tests passing
Dynamic shapes on by default for developers / users

Topic		Replies	Views
PyTorch 2.1: automatic dynamic shape compilation, torch.distributed.checkpoint, torch.compile + NumPy, torch.export prototype, and more! Release Announcements	0	1006	October 6, 2023
TorchInductor: a PyTorch-native Compiler with Define-by-Run IR and Symbolic Shapes compiler	46	66092	July 29, 2024
TorchDynamo: An Experiment in Dynamic Python Bytecode Transformation compiler	7	17159	March 9, 2023
A TorchDynamo trace time ablation study compiler	0	543	March 22, 2024
TorchInductor Update 6: CPU backend performance update and new features in PyTorch 2.1 compiler	0	1980	September 22, 2023