State of symbolic shapes branch: Oct 30 edition
The symbolic-shapes branch (PyTorch: Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub ) is a long running branch containing a large number of features and bugfixes related to dynamic shapes support in PyTorch. Previous update: State of symbolic shapes branch - #8 by ezyang
Commit ID at time of writing: 121e8ebcc2fc50e5ca28cfb3ad437596084424e1
Executive summary
Voz enabled propagation of symbolic shapes in dynamo (previously, symbolic shapes were only propagated in AOTAutograd), and this uncovered a large number of previously undiscovered bugs and coverage problems in TORCHDYNAMO_DYNAMIC_SHAPES=1 itself. The team plans to pivot to working on the torchdynamo codebase to help fix these problems.
- @SherlockNoMad found in Meta OpInfo Test for stride correctness by SherlockNoMad · Pull Request #87849 · pytorch/pytorch · GitHub that there are ~70 aten ops have mismatched stride value between meta function and eager’s implementation. Usually, this is a result of incomplete python meta function, or decompositions (because our unit tests was not asserting on stride’s correctness, and we didn’t have enough test cases for strided inputs). Sherlock compiled all the failures into this tracker sheet Stride Mismatch Tracker - Google Sheets . Bugs on this sheet are open season for burn down.
- Model training status on symbolic-shapes. (@ezyang) See also Operators that need symbolic support - Google Sheets (out of date).
-
Model inference status on symbolic-shapes. (@ezyang)
- inductor, with TDS: 69 out of 177 (new) - logs
-
Model inference status on master. (@ezyang)
- This week is really bad, as we haven’t gotten all the branch fixes after dynamo symbolic shapes fallout. With TORCHDYNAMO_DYNAMIC_SHAPES=1: 4 out of 177 (-23 WoW) - logs
-
OpInfo tests on symbolic-shapes. (@ezyang)
-
pytest -v test/test_proxy_tensor.py -k test_make_fx_symbolic_exhaustive
- 347 passed (+8 WoW), 374 failed (-7 WoW), 497 skipped (+1 WoW) -
pytest -v test/functorch/test_aotdispatch.py -k test_aot_autograd_symbolic_exhaustive
- 238 passed (+19 WoW), 246 failed (-18 WoW), 125 skipped (+0 WoW)
-
Previous branch diff: 70 files changed, 1681 insertions(+), 430 deletions(-)
Current branch diff: 110 files changed, 2788 insertions(+), 2114 deletions(-)
Notable bug fixes
- Dynamo’s dynamic shapes support is seriously buggy, and it doesn’t seem like there is a clear enough conceptual framework for how things should be implemented to make bug fixes simple. For example, voz in Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub needed to write a page of code just to get dynamic size(i) method calls working. We intend to have a knowledge sharing session with Voz on Tuesday to get the team up to speed on how to approach dynamo bugs.
- We’re still fixing incorrect stride bugs. Delete incorrect and duplicate meta_add_ in particular was a symbolic-shapes branch only howler. It is still quite difficult to diagnose these without a minifier; ezyang proposed we have a mode where aot_eager checks the real tensors have metadata consistent with the trace, but this still isn’t implemented yet. Sherlock has a workstream for fixing strides based on the test suite: Stride Mismatch Tracker - Google Sheets There is still a live problem with softmax: minimal repro gist:54f03e02fd36069bf9693ae2ab707d10 · GitHub
- In parallel to Sherlock’s burn down of stride bugs, @ezyang is investigating whether or not we can make incorrect stride bugs less severe by preserving reshapes during tracing. This is tricky to do, because we still have to do autograd, and autograd doesn’t directly support reshape (as it sometimes returns a view and sometimes returns a fresh tensor). Our plan is to transform reshape into an always copying operation, and adding enough extra sanity checking in functionalization to detect if this is not semantics preserving. To do this, we need to run functionalization and autograd at the same time; thus functionalize and compute joint simultaneously. Implementing this was a doozy, as it triggered multiple functionalization bugs:
- FunctionalTensorWrapper that directly wraps fake tensor didn’t report their devices correctly, and so went to the wrong kernels. After staring at the backtrace, this was determined by code reading. We had to both forward device calls to inner tensor in FunctionalTensorWrapper and also update functionalization dispatch key strategy so that FunctionalTensorWrapper appropriately pretends to be a Dense tensor.
- We made a LOT of quality of life improvements to the branch this week. A lot of it was simply dogfooding the software and making adjustments when we noticed things could be improved. These range from paper cuts (spammy warnings, overly verbose exceptions) to important debugging tools like (printing more program state on failure, e.g., the incomplete make_fx traced graph or strides of variables in a graph) to important unblocking features (adding timeout so that sympy infinite loops don’t block model sweeps).
- Use functionalize in symbolic exhaustive tests, not sound otherwise is a howler; someone intentionally turned off functionalization on the test suite, so many tests were failing because AOTAutograd without functionalization isn’t actually sound. Fixing this helped a number of test cases pass. The moral of the story is, don’t give users configuration knobs that are known unsound, unless you make it really obvious (or at least, obvious enough that a core dev doesn’t turn that knob on by default in tests, and another core dev approves it in CR.)
- Fix bernoulli functionalization was an annoying typofix in the bernoulli implementation for a longstanding problem (long enough that XLA had manually worked around it.) It had to be diagnosed manually (by looking at the graphs and noticing some operators were missing from DCE); it could have been immediately caught by testing for no mutating ops after functionalization, but the PR that implemented this is still not landed aot_autograd: add assert for functional-only graph by bdhirsh · Pull Request #85681 · pytorch/pytorch · GitHub (I must emphasize how important it is to actually land the PRs you write!)
- A runtime error complaining about shape mismatch in backwards turned out to be a functionalization bug, where we were not copying enough metadata when doing a shallow copy and detach of FunctionalTensorWrapper (which happens when a variable is saved for backwards.) Both bdhirsh and ezyang root caused the problem at about the same time. Fixed in functionalization: fix detach() and tested by Saved tensor that is view test case functionalization
- This is not a bug fix per se, but an infrastrucutre improvement basic bin_op(symint, symint) support was initially done in a brute force way by adding each overload for mixed operations one-by-one, but on subsequent code review we didn’t like it. Unify SymIntNode and SymFloatNode into SymNode instead removed the static types from C++, which meant you don’t have to add overloads. This produced a patch that was functionally equivalent, but a net decrease in total LOC. https://twitter.com/ezyang/status/1585373693852934144
-
Prevent use of as_strided in backwards was discovered when inspecting some backward graphs by hand and noticing they had a lot of
as_strided
calls in them. For reference, typical user programs don’t really ever useas_strided
, and neither do most of our autograd derivative formulas, so it is weird that they were showing up. In fact, this is due to how our view mutation logic work, which by default converts all views into AsStrided operations so you don’t have to track the provenance of any single view. This is a nice performance optimization for eager, but it generates code that’s worse for compilers, and in fact, XLA already had a way of disabling this shortcut and maintaining the views by hand. Turning this on for PT2 eliminates these as strided calls. BTW, as strided calls are preferably avoided in IR as they are extremely input stride sensitive; if the incoming tensor has a different stride, as strided will silently corrupt your data. This would be less of a problem with a “compositional” variant of as strided that respects input strides (e.g., if a dim was already stride 2, restriding it by 2 would result in 2 * 2). - We have a lot of sympy infinite loops. Fix another sympy infinite loop whackamoles one of them, but there are still more. @Chillee we need to figure out a more sustainable strategy for these.
- Disable aot autograd cache is a nice stopgap: some of our models are now failing because they do exercise dynamic shapes, but our guards are insufficient (because AOTAutograd guards aren’t propagated up to torchdynamo yet.) Disabling the AOTAutograd cache means we always recompile at AOTAutograd until this can be fixed.
- Unify meta tensor and fake tensor converter conversion broke a number of inductor model runs on master (non-dynamic shapes configuration.) I was able to fix this by disabling meta tensor converter’s view preservation (it’s logic to remake a tensor into a view if it was originally a view). But in principle, it should be OK to do this. This is very perplexing. Some of the current investigation notes are at AOTAutograd has soundness bug when passing views as inputs · Issue #1815 · pytorch/torchdynamo · GitHub
What’s new on the branch this week?
This time, instead of doing commits chronologically, I’ve tried grouping them by theme.
Meta support
- Fix meta for meta_fill_ SherlockNoMad
- Delete incorrect and duplicate meta_add_ ezyang/Chillee
- meta for topk ezyang
- Logical binary inplace ezyang
- Implement scalar_tensor meta ezyang
- fix meta defaults in meta__fused_moving_avg_obs_fq_helper bdhirsh
SymInt support
- Symintify tensor_split ezyang
- SymInt padding/output padding on convolution; default arg codegen ezyang
- adaptive_avg_pool2d symintification continued anjali411
- Some more meta support for unit tests test_aotdispatch ezyang
- More progress on quantized ezyang
- pass on aten._fused_moving_avg_obs_fq_helper_functional.default ezyang
- SymIntify resize_ ezyang
- SymIntify _copy functionalization kernels (and _copy_out too) ezyang
- Sym support for torch.numel ezyang
Quality of life
- [HACK] Stop rethrowing exceptions ezyang
- [HACK] Make it easier to correlate what you ran and the status ezyang
- enable pybind debugging by default albanD
- [HACK] Don’t rethrow TorchRuntimeError ezyang (plus bug fix Fix missing re-raise in exception fallthrough )
- Improve broadcast_shapes error printing ezyang
- [HACK] Print size hints, not expressions ezyang
- [HACK] Make logging less spammy ezyang
- Make FX collected traceback more useful by stripping format_stack() f… ezyang
- [HACK] Record stack traces on make_fx by default. ezyang
- [HACK] Print incomplete graph upon failure ezyang
- Added stride printing to gm.print_readable() and removed add_meta met… Chillee + update print_readable anjali411
- Ignore log files ezyang
- Add a 5min timeout ezyang
- Distinguish timeout and regular fail, go to stderr ezyang
- [HACK] Disable High loss value assert ezyang
- Report what node failed when running an operation ezyang
- Beefier error message ezyang
- Be a little more accurate in the report ezyang
Dynamo
- Partial cherry pick from vision_maskrcnn bugfix ezyang
- [Dynamo] Symbolic shape guards ( voz
- Misc dynamo fixes wip cleanups rm comment voz
- [dynamo] Refactor DynamicShapeVariable to be a top level VariableTracker voz (voz/fix_vars branch merge)
- Remove some unnecessary BC code ezyang
- Enable Python dispatcher when running fake tensors in dynamo ezyang
- Fix floor guards in torchdynamo guard codegen ezyang
- Fix slice voz
- Delete questionable symintification of constant ezyang
- Fix dynamo not recording slices in graphs. This led to issues with mi… ( Fix dynamo not recording slices in graphs) voz
- Specialize on dynamic shape control flow instead of graph breaking voz
- Fix math.sqrt and size(i) handling in dynamo ( Fix voz
Functionalization
- Use functionalize in symbolic exhaustive tests, not sound otherwise ezyang
- Fix bernoulli functionalization. ezyang
- Forward device calls to inner tensor in FunctionalTensorWrapper ezyang
- Update functionalization dispatch key strategy ezyang
- Functionalize and compute joint simultaneously ezyang
- Save and restore reapply views TLS correctly ezyang
- Saved tensor that is view test case functionalization ezyang
- functionalization: fix detach() bdhirsh
Infrastructure
- basic bin_op(symint, symint) support more basic bin_op(symint, symint) support bdhirsh, subsumed by Unify SymIntNode and SymFloatNode into SymNode ezyang
- [HACK] Prevent use of as_strided in backwards ezyang
- Do not use unsafe restriding for subclasses ezyang
- Add get_guard_expr to symbolic_shapes which returns all guards in a s… Chillee
- totally untested sym_sqrt support ezyang
- Get the magic method try reverse protocol correct ezyang
- Fix accuracy minifier ezyang
- Fix another sympy infinite loop Chillee
- disable aot autograd cache anjali411
- Properly pass on shape_env to view base ezyang
- Force people to call from_meta_and_device directly ezyang
- Convert MetaConverter’s tensor memo into a weak value dictionary. ezyang
- Unify meta tensor and fake tensor converter conversion ezyang
Merge to master retrospective
- Many symintifications was reverted because it broke internal Executorch build, due to an Executorch only YAML file defining an out variant of an operator (Redirecting...). The fbcode change ended up being trivial, so we relanded the PR by landing it GH1, and then ninja’ing the fbcode fix after it was imported in diff train. @albanD was initially concerned about merge conflicts between fbcode master and GH master because this was a large patch, but this strategy neatly sidestepped the problem (the biggest annoyance being ensuring the diff train was sufficiently imported to import this diff).
- Fix bernoulli functionalization. required non-trivial XLA changes, but at time of writing @ezyang wasn’t able to compile XLA successfully. Fortunately, JackCaoG helped do the XLA side patch (which was relatively long, but not too difficult; just undoing XLA’s bernoulli hack.)
- Unify meta tensor and fake tensor converter conversion was reverted on master because inductor tests were not run on PR. This will be fixed by enabling inductor CI on all fake tensor changes.
What’s made it to master this week?
- albanD (merge captain)
- ezyang
- Unify SymIntNode and SymFloatNode into SymNode + Fix pybind11 problems with c10::SymInt unregistered
- Convert MetaConverter’s tensor memo into a weak value dictionary.
- Force people to call from_meta_and_device directly
- Fix bernoulli functionalization.
- Improve argument printing
- Fix accuracy minifier
- as_strided_scatter storage offset defaults to None not 0
- bdhirsh
- anjali411
- Chillee
- voz
What’s coming next?
- Educate the team on how torchdynamo dynamic shapes is supposed to work, and spend a lot of time fixing issues here
- E2E training on master with inductor.
- Hook up torchdynamo’s symbolic shapes to AOT Autograd’s symbolic shapes, check Sym Shapes, Control Flow in Dynamo - RFC + Plan of Record - Google Docs for design details)
- Resolve strategy for sharing ShapeEnv between forward and backwards (@ezyang’s take: don’t return SymInts from forward pass)
- Reshape unification
- All benchmark models are passing aot_eager training on branch; tracked at Operators that need symbolic support - Google Sheets
- Fallback implementation for custom operators without symbolic shape propagation, inferred by running fallback on real operators
- All OpInfo tests passing