State of symbolic-shapes branch: Sep 17 edition
The symbolic-shapes branch (PyTorch: Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub ; torchdynamo: [WIP branch] symbolic shape hacking by Chillee · Pull Request #1180 · pytorch/torchdynamo · GitHub) are long running branches in PyTorch/torchdynamo containing a large number of features and bugfixes related to dynamic shapes support in PyTorch.
Commit IDs at time of writing: pytorch e508e5ce3adaa3464f210e26e738e53d4ec4718c; torchdynamo 3ddb46e873c2bdd1c59217a128b9b2b7af8696fe
Executive summary
We started this branch three weeks ago, to move more quickly on adding dynamic shapes support to PyTorch, as getting past master CI was a bottleneck for our work. We made a lot of progress: this branch successfully runs pytorch_BERT forward/backward with the no-op AOTAutograd backend, and the forward mode is compileable by Inductor, producing a kernel that we have verified works with varying batch sizes without inducing recompilation.
From this work, we discovered tracing with dynamic shapes is quite slow. Over the last week, we made a lot of progress optimizing tracing time overhead (on devfair040, we went from 116.69s to 56.68s for E2E pytorch_BERT forwards-backwards on aot-nop), which stemmed from generating too many FX nodes for symbolic ints.
Based on our progress and discussions with Jason Ansel, we are now pivoting to enabling dynamic shapes by default for all our benchmark models, so that we can start exercising inductor on dynamic shapes. This involves merging the symbolic-shapes branch to master and blitzing operator support for torchbench.
Current branch diff: 74 files changed, 2880 insertions(+), 506 deletions(-)
What should I expect to work?
We are currently using the following command to test end-to-end:
TORCHDYNAMO_DYNAMIC_SHAPES=1 AOT_DYNAMIC_SHAPES=1 python benchmarks/torchbench.py --only BERT_pytorch --accuracy-aot-nop --training
What’s new in the branch this week?
NB: some changes were added to branch and then merged to master; those are listed in merged to master section only.
- Properly save sym_strides Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub
- unsafe_view functionalize support, restore reshape calls Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub
- Restore singleton CompositeImplicitAutograd check Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub
- Functionalization support for view_copy Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub
- Don’t unnecessarily recompute in functionalize Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub
- Add AOT_DYNAMIC_SHAPES envvar Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub
- Separate guard_int and int; make int fail you again Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub
- Lazily generate FX nodes for PySymInts Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub
- cpp pytree functionality Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub
- Fix bug in computeStorageNBytes Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub
- Fix bug in empty_like Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub
What’s made it to master this week?
- New calling convention for Python dispatcher New calling convention for Python dispatcher by ezyang · Pull Request #85133 · pytorch/pytorch · GitHub
- Optimize torch.ops.ns.opname.overload accessor in torch dispatch Optimize torch.ops.ns.opname.overload accessor in torch dispatch by ezyang · Pull Request #85132 · pytorch/pytorch · GitHub
- Performance optimizations to proxy tensor https://github.com/pytorch/pytorch/pull/85049
- Delete SymIntArrayRef wrapper struct https://github.com/pytorch/pytorch/pull/84837
- Fix bugs in how LTC decides whether or not to symint op or not https://github.com/pytorch/pytorch/pull/84832
- Added additional simplifications/caching for replacements and divisibility https://github.com/pytorch/pytorch/pull/84918
- empty strided symint https://github.com/pytorch/pytorch/pull/84830
- Added support for symbolic is_contiguous https://github.com/pytorch/pytorch/pull/84829
- remove symintnode bits in LTC https://github.com/pytorch/pytorch/pull/85171
- SymInt support for computeStride, multiply_integers, InferSize https://github.com/pytorch/pytorch/pull/84905 (stack), infer_size_dv https://github.com/pytorch/pytorch/pull/84899
What’s coming next?
High priority:
- Merge into master. Items to merge are annotated with comments on the symbolic-shapes PR. This includes some subtasks:
- Get min-cut partitioner working with symbolic shapes
- Figure out what tests are failing on the branch
- Get fake tensors and AOTAutograd working / Integrate inductor/dynamo dynamic shape analysis “properly” (related to sharing fake tensors) https://github.com/pytorch/pytorch/pull/85233
- Full operator coverage for all benchmark models
- Fallback implementation for custom operators without symbolic shape propagation, inferred by running fallback on real operators (can be done later)
Low priority:
- Figure out why accuracy fails on E2E BERT
- Get inductor working E2E with training on BERT
- Get hf_BERT working (pytorch_BERT is different
(Low priority atm) Get more models working