State of symbolic shapes branch

ezyang · December 12, 2022, 2:15pm

State of symbolic shapes branch: Dec 12 edition

Previous update: State of symbolic shapes branch - #19 by ezyang

Commit ID at time of writing: bcb284d77fe865373b2f1617867320fb32ea68af

Executive summary

We master now peeps! We are officially retiring the symbolic-shapes branch. There are still some changes that need to be merged to master (Symbolic shapes work items tracker - Google Sheets “Merge to master” sheet), but the major fixes for shape guard creation have landed to master, so all that remains on the branch are some QOL fixes (in particular the debug interpreter is no longer on by default), a little bit of op coverage, and some experimental code (esp for inductor integration) that needs to be rewritten anyway.

The big ticket PRs that landed on master are Use dynamo fake tensor mode in aot_autograd, move aot_autograd compilation to lowering time [Merger of 89672 and 89773] and Completely redo how ShapeEnv guards are generated; together, we now have complete and accurate dynamic shape guards for Dynamo+AOT Autograd. Inductor is the next frontier!
Add unbacked symints support; item works now is one step on the path to tracing through item() calls properly. It is, unfortunately, not so useful right now, because although you can trace item(), the resulting SymInt can’t be used for much, because you’re not allowed to guard on it. We need to design a system where the user can insert runtime checks which then can be used to discharge later constraints. This will require designing an entailment system for SymPy. @eellison has expressed interest in this.
Model training status on symbolic-shapes. See also Symbolic shapes work items tracker - Google Sheets
- aot_eager: 141 out of 169 (+5 WoW) logs (sheet: aot_eager 12/11 + localfix). Note that these results are with this patch gist:2a4894db201d35b645cb700db92b54bb · GitHub
- inductor: 5 out of 169 (-31 from branch) logs (sheet: inductor 12/12). This regression is because we have lost all the inductor related hacks that were on the branch.
OpInfo tests on symbolic shapes.
- pytest test/test_proxy_tensor.py -k test_make_fx_symbolic_exhaustive - 508 passed (+6 WoW), 522 skipped (+10 WoW), 230 xfailed (+1 WoW)
- pytest test/functorch/test_aotdispatch.py -k test_aot_autograd_symbolic_exhaustive - 281 passed (+32 WoW), 141 skipped (+8 WoW), 208 xfailed (-3 WoW)

Previous branch diff: 68 files changed, 2612 insertions(+), 554 deletions(-)
Current branch diff: 0 files changed, 0 insertions(+), 0 deletions(-)

Notable bugs

It turns out checkpointing doesn’t operate on ShapeEnv, but it should. Dynamo uses checkpoints to roll back its internal state after executing an instruction (or many instructions, in the case of an inlined function call) fails, so that it can pretend those instructions never executed. However, because ShapeEnv isn’t checkpointed, shape guards that occur during the rolled back instructions still end up getting installed. This can result in a hard error if the guards refer to variables that we don’t know about from the outer context (we think hf_Reformer and swin_base_patch4_window7_224 are affected by this). Checkpointing the ShapeEnv performantly is nontrivial, as we refine the context with equalities and use that to drive sympy simplification, all of which would need to be undone. This bug is still unfixed.
Preserve original GraphArgs for shape guard codegen and Rewrite dynamo cond() handling to not recursively call export are both fixes for pretty interesting bugs, if I don’t say so myself. Go checkout their PR descriptions for more details.

What’s made it to master this week?

ezyang

nkaretnikov

[Inductor] handle non-positive exponents in Pow

voz

SherlockNoMad

Disallow registering meta function for CompositeImplicitAutograd ops

What’s coming next?

By Person:

voz: Guard refactor in dynamo
ezyang: burn down symbolic shapes, fix bugs, work on exporting all shape expressions to form a benchmark, aot autograd default api maybe?
bdhirsh: continue fixing AOTAutograd v2 follow up bugs
jbschlosser: merge to master tasks, burn down symbolic shapes
unallocated: inductor integration

Our north star:

All benchmark models are passing aot_eager and inductor training on branch
Fallback implementation for custom operators without symbolic shape propagation, inferred by running fallback on real operators
All OpInfo tests passing
Dynamic shapes on by default for developers / users

Topic		Replies	Views
How to invoke symbolic shape propagation? frontend API	3	449	November 16, 2023
State of PyTorch core: September 2021 edition frontend API	1	9405	September 21, 2021
Lazy Tensor Core hardware-backends	20	7638	July 12, 2022
Symbolic Shape Inference torchscript	1	1574	March 31, 2021
Understanding dynamic shapes and guards and when it does/does not cause graph breaks compiler	1	324	November 7, 2024