State of symbolic shapes branch: Nov 12 edition
The symbolic-shapes branch (PyTorch: Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub ) is a long running branch containing a large number of features and bugfixes related to dynamic shapes support in PyTorch. Previous update: State of symbolic shapes branch - #16 by ezyang
Commit ID at time of writing: 807a62fc61bea26707c3dc09a12bad204e375a95
Executive summary
This was a chaotic week. Meta had layoffs for the first time in its history (this workstream was not directly affected.) We still made progress on this workstream (merge to master, inductor support, operator coverage), but we also discovered more work to do (more dynamo bugs, dynamic shape guard problems, more accuracy failures). Some big work items (dynamo merge to master, input mutation, copy-on-write tensors) have progressed, but are still short of actually landing to master (or even to the branch, as the case may be). Merge to master is also slow going as we often have to first add OpInfo support before we can merge our changes.
- Staffing. Nikita Karetnikov from Quansight is pivoting from general PrimTorch work to working on decompositions / meta implementations that the dynamic shapes workstream can specifically benefit from. Welcome Nikita! In other news, Edward is on jury duty next week.
- Design. We have shifted the “stride agnostic IR” concept to a more expansive “stride agnostic PyTorch” concept, where we make eager mode PyTorch as whole less sensitive to stride changes. This includes a new design for Copy-on-write tensors for eager PyTorch - Google Docs which aims to eventually make the BC-breaking change to reshape()/contiguous()/etc to have these functions always return contiguous tensors. A PoC PR for the entire design exists Copy on write reshape by ezyang · Pull Request #88774 · pytorch/pytorch · GitHub and fully passes non-trunk CI, but there are some unresolved questions, such as whether or not to more deeply integrate data pointer reference counting into Storage to reduce the overall level of indirection, and whether or not the proposed warning strategy is too loud or not. This pair of proposals was discussed in the most recent Composability meeting; there were no major objections but also a desire to better understand the implications of the change.
- Make silent errors noisy. A big reason why our aot_eager pass rate regressed this rate is we turned on more stringent error checking in the branch, to try to transform potential bugs into assertion failures. This week, we: (1) assert sizes/strides of intermediate tensors are consistent between fake and real tensors, (2) assert functional-only graph after lowering (this turns the batch norm problem we observed last week into a hard error; to bypass some of these errors, we disabled AOTAutograd from running on subgraphs with BatchNorm), (3) assert all guards correspond to tensors dynamo knows about (this flushed out a problem with symbolic shapes guards, where dynamo was not tracking enough guards; we fixed one of the problems, but that didn’t hit all of the issues, so we also have a way of suppressing these failures). Unfortunately, while these changes did nail some accuracy problems, we still have new accuracy failures on the latest model runs.
- Inductor has problems. The branch now has some quick and dirty hacks which substantially improved the inductor pass rate (+16 working models), but there are still are many bugs that are causing lots of models to fail for similar reasons. The conclusion is that there are some fundamental pieces in the dynamic shapes inductor integration that don’t exist yet (though @Chillee still assures me they’re easy to do.) On the bright side, the uniformity of inductor errors means there probably aren’t that many distinct bugs to fix.
-
Model training status on symbolic-shapes. (@ezyang) See also Symbolic shapes work items tracker - Google Sheets
- aot_eager, with TDS: 135 out of 163 (-2 WoW) logs csv Note that this run is skipping all subgraphs with batchnorm and with dynamo guard asserts suppressed (in the spreadsheet, this is noted as BN+IA)
- inductor, with TDS: 24 out of 163 (+16 WoW) (logs too long) csv
- Lowlight: jx_nest_base and twins_pcpvt_base are failing with accuracy errors. Interestingly, this pair of models previously was failing with accuracy errors without dynamic shapes; both were fixed by a single PR https://github.com/pytorch/pytorch/pull/85417 . jx_nest_base is minifiable, although the minifier failed with a reduction error on int when I tried running it (I did not try very hard). twins_pcpvt_base was passing on 10/28 and regressed into a timeout in 11/2 (this is after voz’s major dynamo change hit master); jx_nest_base has never passed.
- Highlight: cait_m36_384 and mobilenet_v2_quantized_qat accuracy failures turned into non accuracy failures after we added DebugInterpreter to aot_eager. cait_m36_384 is now passing; no one has had a chance to investigate mobilenet_v2_quantized_qat
-
OpInfo tests on symbolic-shapes. (@ezyang)
-
pytest test/test_proxy_tensor.py -k test_make_fx_symbolic_exhaustive
- 388 passed (+33 WoW), 334 failed (-36 WoW), 501 skipped (+2 WoW) logs csv -
pytest test/functorch/test_aotdispatch.py -k test_aot_autograd_symbolic_exhaustive
- 255 passed (+15 WoW), 213 failed (-15 WoW), 129 skipped (+2 WoW) logs csv
-
Previous branch diff: 76 files changed, 2341 insertions(+), 533 deletions(-)
Current branch diff: 82 files changed, 2236 insertions(+), 747 deletions(-)
Notable bugs
- Fix buggy unsqueeze_ implementation. I found this error because the unsqueeze OpInfo test was failing (hooray unit tests). I didn’t actually directly debug the bug; I just replaced the code wholesale with a straight port of the C++ code (I think the bug was in how dimension wrapping was implemented, though!)
- Fix stride on softmax_backward_data, fixes cait_m36_384 was found via the DebugInterpreter. The interesting thing about this fix was that I had actually made a repro last week, but no one had picked it up and fixed it, and when I went to look at it again the repro no longer actually failed. Fortunately, DebugInterpreter confirmed that it was indeed a problem with softmax_backward_data.
- More correct fix for upsample_bilinear2d decompositions is a fixup of a previous PR, where I attempted to register a composite implementation for upsample_bilinear2d.default in the Python dispatcher. I initially tried CompositeImplicitAutograd; this did not work, because this operator has an explicit autograd formula (Autograd key registration), and Autograd is higher precedence than CompositeImplicitAutograd. This is easy to work around if you know the correct semantics, but you might expect Python registrations to “override” their C++ variants; we should look into potential API changes to remove this footgun.
- Suppress guards when constructing fake tensors was discovered by chasing down an assert failure from Dynamo when it needed to construct a shape guard involving a symbolic variable that it didn’t know the source of. The problem is that we do non-trivial tensor operations to make, e.g., view fake tensors actually views, but this means the base tensor gets its own set of fresh symbolic variables that dynamo doesn’t currently track. Fixing this in Dynamo is a fairly involved refactor, especially because we’d like some design that makes it hard for Dynamo to forget to track tensors (e.g., tie it to fake tensor conversion). In the meantime, we added a config driven via TORCHDYNAMO_IGNORE_ASSERT to allow dynamo to suppress these errors for now.
- call call_method instead of _call_operator_builtin - operator calls in Dynamo don’t properly handle NotImplemented. Brian/Anjali tried to patch over this, but the bugfix was buggy, so Voz yanked it out from the branch again. This problem needs to be solved again properly.
- Some infra/QoL commits were about making it easier to debug errors if things fail. For example, if exec fails to compile code, print the code that failed to compile. There are some design choices which produce unreadable debug output; for example Change how sympy guard formatting works modifies symbolic shape guards to print with newlines, instead of being mashed into a giant unholy expression.
- Fix call_range and remove weird idiom of fallthorugh for dynamic in f…. This failed in a very strange way, and even after Edward identified what the problem was, he couldn’t figure out how to fix it (and Voz fixed it later.) It would be good to have a better understanding of what the Right™ way to fix these issues in Dynamo are.
What’s new on the branch this week?
Meta/decomp support
- Fix buggy unsqueeze_ implementation ezyang
- Apply out_wrapper to diag_embed ezyang
- Pass on missing kwargs to torch.empty ezyang
- Fix inplace registrations for alpha ezyang
- Fix stride on softmax_backward_data, fixes cait_m36_384 ezyang
- Convert aten.upsample_bilinear2d decomp from vec to default [HACK] Get upsample_bilinear2d going More correct fix for upsample_bilinear2d decompositions Add missing int cast in upsample_compute_output_size ezyang
SymInt support
Infrastructure
- Add is_view to OpOverload ezyang
- [HACK] move batch_norm segfault to the overall skip section ezyang
- Add a DebugInterpreter, and make aot_eager use it by default. ezyang
- Suppress guards when constructing fake tensors ezyang
- Symbolic shape floor and sym_sqrt SherlockNoMad
Quality of life
- Make hasattr get_dtype assert error slightly more useful ezyang
- [QOL] Make suppressing xfails optional with SUPPRESS_XFAILS ezyang
- Suppress original traceback ezyang
- [QOL] some more test runner scripts ezyang
- aot_autograd: add assert for functional-only graph bdhirsh
Dynamo
- call call_method instead of _call_operator_builtin anjali411
- return the value as is in evaluate_expr if the val is not symbolic anjali411
- makes BigBird pass with AOT_DYNAMIC_SHAPES=1 at least anjali411
- [dynamo][symbolic-shapes] Refactor symbolic shapes dynamo, make tests… voz
- Add a config driven via TORCHDYNAMO_IGNORE_ASSERT to allow dynamo to … voz
- Fix typo in assertion error ezyang
- If torch.* API returns something unexpected, graph break instead of a… ezyang
- If exec fails to compile code, print the code that failed to compile ezyang
- Parenthesize lambda so code can have newlines in it ezyang
- Change how sympy guard formatting works ezyang
- Fix call_range and remove weird idiom of fallthorugh for dynamic in f… voz
- Make it easier to tell what is returning NotImplemented in the analys… ezyang
- Enable python dispatcher when doing mutation analysis ezyang
- [HACK] Assume batch norm does input mutation, exclude it from AOTAuto…
Inductor
- [HACK] Handle symbolic tensors when static_sizes_strides is called + Slightly more correct version ezyang
- [inductor] Try to handle SymInt args correctly ezyang
- [inductor] [HACK] Ignore SymInt in a bunch of places of codegen ezyang
- [HACK] make hf_GPT2 work again ezyang
Merge to master retrospective
- Meta registrations, Nov 7 edition, part 1 had to be split into small constituent PRs, because CI on the mega PR was failing on a fixed set of seemingly unrelated tests, even after rebase. However, the problem evaporated when everything was landed individually, so it’s not really clear what the moral of the story here is.
- reland “fix as_strided_scatter_backward (#87646)” was reverted because it broke some trunk jobs. It looks like this was resolved by adding more xfails/skips: reland "fix as_strided_scatter_backward (#87646)" by bdhirsh · Pull Request #88342 · pytorch/pytorch · GitHub
- Some meta function merges are slow because OpInfo coverage is insufficient. There are two common themes: first, sometimes there is an OpInfo for a given operation, but the OpInfo covers a lot of different overloads/dim specialization of the function, and we only implemented one overload in a PR. To easily test, we have to extract out one particular overload from the OpInfo, or go ahead and implement all of the overloads so the mega OpInfo works. (Arguably, this is a bad decision in OpInfo design.) Second, many missing OpInfos relate to backward functions. Hypothetically, these can be tested indirectly via the forward-backward call that test_aotdispatch.py performs, but in practice it seems to be easier to just add the backward OpInfo.
What’s made it to master this week?
- albanD
- ezyang
- Fix XLA symbolic shapes binding
- Unconditionally enable python dispatcher in AOTAutograd
- Mark diag.out composite
- Meta registrations for inplace operators
- Meta implementation for unsqueeze_
- Meta implementation for bernoulli
- Support diag_embed.out decomposition
- Minor error message improvements on meta functions
- Propagate layout and pin memory in randint to inner constructor
- Mark as_strided_ as supporting SymInt in C++
- Correctly test that dtype/device match in generated .out kernels for composites
- Add support for symbolic shapes to sparse tensor
- SymIntify _copy functionalization kernels (and _copy_out too)
- bdhirsh
- anjali411
- SherlockNoMad
- nkaretnikov
What’s coming next
By person:
- bdhirsh: fix input mutation for AOTAutograd (same as last week, progress at first draft of input mutation handling for aot autograd by bdhirsh · Pull Request #88817 · pytorch/pytorch · GitHub )
- voz: merging the first round of dynamo patches to master; afterwards, aot autograd cache, restore builtins support in dynamo, fix symbolic shape guard construction in dynamo, refactor dynamo storage of dynamic shapes
- ezyang: run some more experiments on copy-on-write
- Chillee: fix sympy infinite loops, get inductor training working with dynamic shapes
- anjali411: land more branch changes to master, run sweeps this week
- nkaretnikov: continue working on test_proxy_tensor coverage
Our north star:
- All benchmark models are passing aot_eager and inductor training on branch
- Fallback implementation for custom operators without symbolic shape propagation, inferred by running fallback on real operators
- All OpInfo tests passing
- Dynamic shapes on by default for developers / users