State of symbolic shapes branch: Dec 12 edition
Previous update: State of symbolic shapes branch - #19 by ezyang
Commit ID at time of writing: bcb284d77fe865373b2f1617867320fb32ea68af
Executive summary
We master now peeps! We are officially retiring the symbolic-shapes branch. There are still some changes that need to be merged to master (Symbolic shapes work items tracker - Google Sheets “Merge to master” sheet), but the major fixes for shape guard creation have landed to master, so all that remains on the branch are some QOL fixes (in particular the debug interpreter is no longer on by default), a little bit of op coverage, and some experimental code (esp for inductor integration) that needs to be rewritten anyway.
- The big ticket PRs that landed on master are Use dynamo fake tensor mode in aot_autograd, move aot_autograd compilation to lowering time [Merger of 89672 and 89773] and Completely redo how ShapeEnv guards are generated; together, we now have complete and accurate dynamic shape guards for Dynamo+AOT Autograd. Inductor is the next frontier!
- Add unbacked symints support; item works now is one step on the path to tracing through item() calls properly. It is, unfortunately, not so useful right now, because although you can trace item(), the resulting SymInt can’t be used for much, because you’re not allowed to guard on it. We need to design a system where the user can insert runtime checks which then can be used to discharge later constraints. This will require designing an entailment system for SymPy. @eellison has expressed interest in this.
-
Model training status on symbolic-shapes. See also Symbolic shapes work items tracker - Google Sheets
- aot_eager: 141 out of 169 (+5 WoW) logs (sheet: aot_eager 12/11 + localfix). Note that these results are with this patch gist:2a4894db201d35b645cb700db92b54bb · GitHub
- inductor: 5 out of 169 (-31 from branch) logs (sheet: inductor 12/12). This regression is because we have lost all the inductor related hacks that were on the branch.
-
OpInfo tests on symbolic shapes.
-
pytest test/test_proxy_tensor.py -k test_make_fx_symbolic_exhaustive
- 508 passed (+6 WoW), 522 skipped (+10 WoW), 230 xfailed (+1 WoW) -
pytest test/functorch/test_aotdispatch.py -k test_aot_autograd_symbolic_exhaustive
- 281 passed (+32 WoW), 141 skipped (+8 WoW), 208 xfailed (-3 WoW)
-
Previous branch diff: 68 files changed, 2612 insertions(+), 554 deletions(-)
Current branch diff: 0 files changed, 0 insertions(+), 0 deletions(-)
Notable bugs
- It turns out checkpointing doesn’t operate on ShapeEnv, but it should. Dynamo uses checkpoints to roll back its internal state after executing an instruction (or many instructions, in the case of an inlined function call) fails, so that it can pretend those instructions never executed. However, because ShapeEnv isn’t checkpointed, shape guards that occur during the rolled back instructions still end up getting installed. This can result in a hard error if the guards refer to variables that we don’t know about from the outer context (we think hf_Reformer and swin_base_patch4_window7_224 are affected by this). Checkpointing the ShapeEnv performantly is nontrivial, as we refine the context with equalities and use that to drive sympy simplification, all of which would need to be undone. This bug is still unfixed.
- Preserve original GraphArgs for shape guard codegen and Rewrite dynamo cond() handling to not recursively call export are both fixes for pretty interesting bugs, if I don’t say so myself. Go checkout their PR descriptions for more details.
What’s made it to master this week?
ezyang
- Preserve original GraphArgs for shape guard codegen
- Improve dynamo debug logging
- Slightly improve error messages on sympy failure
- Add a timeout to benchmark script
- Add unbacked symints support; item works now
- Completely redo how ShapeEnv guards are generated
- SymIntify resize_ and deduplicate memory format logic
- Add missing infer_size_symdimvector implementation.
- Revert guaranteed symint allocation
- Keep track of source name on all allocated SymInts
- Rewrite dynamo cond() handling to not recursively call export
- Ensure that we fakeify tensor subclasses when they are initially tracked
- ShapeEnv.create_symbolic_sizes_strides_storage_offset
nkaretnikov
voz
- Shape guard structure
- Make torch._guards, shuffle structures around for migration
- Add
TORCH_FAKE_TENSOR_DEBUG
use it to enable storage of traces on fake tensors at init time - Light refactor to how we get shape_env for graph lowering
- Use dynamo fake tensor mode in aot_autograd, move aot_autograd compilation to lowering time [Merger of 89672 and 89773]
SherlockNoMad
What’s coming next?
By Person:
- voz: Guard refactor in dynamo
- ezyang: burn down symbolic shapes, fix bugs, work on exporting all shape expressions to form a benchmark, aot autograd default api maybe?
- bdhirsh: continue fixing AOTAutograd v2 follow up bugs
- jbschlosser: merge to master tasks, burn down symbolic shapes
- unallocated: inductor integration
Our north star:
- All benchmark models are passing aot_eager and inductor training on branch
- Fallback implementation for custom operators without symbolic shape propagation, inferred by running fallback on real operators
- All OpInfo tests passing
- Dynamic shapes on by default for developers / users