State of symbolic shapes branch: Dec 1 edition (even of PyTorch Conference)
The symbolic-shapes branch (PyTorch: Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub ) is a long running branch containing a large number of features and bugfixes related to dynamic shapes support in PyTorch. Previous update: State of symbolic shapes branch - #18 by ezyang
Commit ID at time of writing: a05b7b1c73247ff562a82aac0edca79bbaebc2bd
Executive summary
It is the eve of the PyTorch Conference and we have been busy getting things ready for some big announcements. Before and after Thanksgiving, many folks involved with dynamic shapes were deputized to help fix some major release blockers in the general compiler workstream; Brian and Jane landed all of the pieces needed to properly update batch norm running stats, and Alban and Edward found and fixed some more fairly major AOTAutograd bugs. On the dynamic shapes front, Voz has been steadily working on getting all of the Dynamo changes passing CI on master; half of the preparatory changes have been landed so far, and the branch has been resync’ed after those merges. There is some regression in the aot_eager pass rate as we remove hacks and redo fixes properly.
- Lazily guarding for duck sizing and views - Google Docs is our plan (Voz + Edward, with some assistance from Horace) for dealing with the major outstanding soundness bug in dynamic shapes, which is that we are not guarding on duck sizing. Edward’s previous attempt at Total revamp of how ShapeEnv symbol allocation works by ezyang · Pull Request #89695 · pytorch/pytorch · GitHub is hopelessly broken on master, so we are redoing it more incrementally with the new game plan. A bit of this has involved quite a lot of Dynamo refactoring.
- Batch norm running stats are now properly updated after first draft of input mutation handling for aot autograd and fixes for inductor <> batch norm. Mark Saroufim has confirmed this resolves training instability when using inductor (turns out, batch norms running stats matter, whodathought.) Alban posted a really nice fix Fix CopySlices logic to ensure wrapped node runs properly. for a bug that took Voz and Ed two days to track down. But still there are more bugs: AOTAutograd and internal grad_fn structure - Google Docs
- Model training status on symbolic-shapes. See also Symbolic shapes work items tracker - Google Sheets
-
OpInfo tests on symbolic shapes.
-
pytest test/test_proxy_tensor.py -k test_make_fx_symbolic_exhaustive
- TODO -
pytest test/functorch/test_aotdispatch.py -k test_aot_autograd_symbolic_exhaustive
- TODO
-
Previous branch diff: 68 files changed, 2612 insertions(+), 554 deletions(-)
Current branch diff: 68 files changed, 1440 insertions(+), 290 deletions(-)
What’s new on the branch these two weeks?
Metas/decompositions
- Don’t decompose copy (sic) ezyang
Infrastructure
- Make aten.copy preserve strides (hf_Longformer) ezyang
- Retrace backwards in new shape environment ezyang
- Handle some edge cases regarding constant nodes and None SymInt tangents ezyang
- Suppress guards on as_strided call only. ezyang
Debug interpreter
-
Make DebugInterpreter work with varying dynamic shapes ezyang
Properly handle out of order bindings ezyang - Minor QOL improvement for deferred equality tests ezyang
Dynamo
- Add guard_source for RandomValueSource ezyang
- Don’t use explain() for --explain; instead read it off the counters ezyang
- Remove fake_tensors_available ezyang
- Cond capture with fake tensors actually works; don’t raise in this case ezyang
- Support unspecialized integers with dynamic shapes ezyang
- Easy: These tests work with fake_tensor_propagation on ezyang
- Force test_rng_state to run with fake tensor prop ezyang (nb: was reverted)
- Run optimizer tests with fake tensors ezyang
- Reenable fake_tensor_propagation on test_cudnn_rnn ezyang
- Graph break on torch.tensor failure, allowing maml to run with fake t… ezyang
- Remove fake_tensor_propagation ezyang
- Delay verify correctness wrapping to call site. ezyang
- Don’t support kwargs at runtime in aot_module_simplified ezyang
- Simplify aot_module_simplified by removing top_args/top_kwargs ezyang
- Change aot_module_simplified to take take arguments directly ezyang
- Make aot_module_simplified accept fake tensors ezyang
- Use isinstance test rather than exact type test for wrap to fake ezyang
- Disable cache to restore accuracy ezyang
Inductor
- Sufficient to get inductor working on BERT_pytorch again ezyang
- [UPDATED PROTOTYPE] Use dynamo fake tensor mode in aot_autograd, move… voz/ezyang
- Restore the base fix, which fixes most of the missing symbol errors ezyang
- Restore enable_python_dispatcher on has_mutation analysis ezyang
- Restore RANDOM_VALUE fix ezyang
- Restore TENSOR_MATCH fix ezyang
- Restore stack tracking for sympy symbols ezyang
QOL
- Make log_extract.py able to deal with NotImplementedError ezyang
- print graph breaks by default ezyang
- Dasboard runner cmd anijain
Merge to master retrospective
- Reland “Add single process version of dynamo distributed hf_Bert tests (#89721)” - this got bounced because not enough tests ran on PR. We added more files to automatically trigger inductor tests.
- Refactor how AOTAutograd backends are defined - this is just an example of a few cases where folks ran inductor CI, got accuracy failure on a model, and then spent a bunch of time trying to debug what had happened; when in fact, the failure was a preexisting master failure. It is not easy to identify these because ciflow/inductor does not run on every master commit.
- Change aot_module_simplified to take take arguments directly - this broke a timm model, and lead us on a pretty big chase that eventually revealed that example inputs being passed to backends did not have correct requires grad because they were being cloned. This was fixed by refactoring the AOTAutograd-Dynamo integration to not clone example inputs.
- Remove fake_tensor_propagation - this nearly got bounced because it broke some internal users who didn’t have fake tensor support for some operations. Averted because (1) their tests weren’t in CI and (2) it turned out to be pretty easy to add meta tensor support.
- Don’t unsafely clone autograd meta - this couldn’t be landed because it broke an inductor model, causing it to raise an error where previously it passed. This lead to a very long debugging session by Alban until we finally nailed the problem.
What’s made it to master this week?
ezyang
- Add manual meta implementations to quantize_per_tensor.tensor and co
- Guarantee symbol allocation for all sizes/strides/storage offset
- Add definitely_not_01 set to ShapeEnv.
- Reland “Add single process version of dynamo distributed hf_Bert tests (#89721)”
- Refactor how AOTAutograd backends are defined
- Don’t unsafely clone autograd meta
- Implement guard_source on RandomValueSource
- Beef up AOTAutograd logging with aot_id and input descriptions
- Use isinstance test rather than exact type test for wrap to fake
- Make aot_module_simplified accept fake tensors
- Change aot_module_simplified to take take arguments directly
- Simplify aot_module_simplified by removing top_args/top_kwargs
- Don’t support kwargs at runtime in aot_module_simplified
- Delay verify correctness wrapping to call site.
- Ablate _torchdynamo_orig_callable in wrap_compiler_fn
- Don’t suppress exceptions from backends
- Remove fake_tensor_propagation
- Support unspecialized integers with dynamic shapes
- Remove fake_tensors_available
- Access named parameters/buffers/etc via getattr rather than index
- Suppress guards on as_strided call only.
- Don’t use explain() for --explain; instead read it off the counters
- Add crossref debug mode for functionalization, catches stride errors
- Make aten.copy preserve strides (hf_Longformer)
- Don’t decompose copy (sic)
- Bind DispatchKey.Functionalonalize in pybind11
- Suppress guards when creating fake tensors
- When dealing with dupe arguments, prefer leafifying if possible
- Add debug asserts to AOTAutograd for input consistency with compilation
- Factor input deduplication into a separate function
bdhirsh
- don’t run input mutation analysis in dynamo
- fixes for inductor <> batch norm
- first draft of input mutation handling for aot autograd
anjali411
nkaretnikov
- [primTorch] Unify checks for
embedding
- [primTorch] Add decomp for
embedding_renorm_
- Symintify
embedding
- Symintify
select
voz
- Add simple assert to detect fake tensors on modules
- Fix try/except flow where DataDependentOutputException is getting wrapped in a RuntimeError
albanD
- Fix CopySlices logic to ensure wrapped node runs properly. and beef up inplace/view note on copy slices
What’s coming next?
- Land fake tensor propagation from Dynamo to AOTAutograd (voz)
- ShapeEnv revamp to get guards for duck sizing (ezyang)
- GuardEnv for non-shape related extra guards produced by AOTAutograd (voz)
- Address CI comments for AOTAutograd input mutation, factoring it to be more modular (bdhirsh)
- Proper inductor integration (Chillee didn’t end up working on it, unallocated; mildly blocked on ShapeEnv revamp)
Our north star:
- All benchmark models are passing aot_eager and inductor training on branch
- Fallback implementation for custom operators without symbolic shape propagation, inferred by running fallback on real operators
- All OpInfo tests passing
- Dynamic shapes on by default for developers / users