State of symbolic shapes: Dec 19 edition
Previous update: State of symbolic shapes branch - #20 by ezyang
Commit ID at time of writing: 212873c615dd3455a24d390605335aeeebd76236
Executive summary
This week, we turned on dynamic shapes with aot_eager on CI in the inductor job. Compared with static shapes aot_eager, we only have a 17 failures difference on master! Inductor remains in bad shape in master, as we are still waiting on @Chillee to submit his PR with fixes.
In other news, @ezyang has released a benchmark for reasoning on shape computation: GitHub - ezyang/SMT-LIB-benchmarks-pytorch-shapes: SMT-LIB benchmarks for shape computations from deep learning models in PyTorch If you work on SMT solvers or like symbolic reasoning systems, check it out! It offers an easy way to test out new ideas about how to symbolically reason over shape compute. We still have a number of infinite loops in Sympy, although this week we are now just suppressing all stack overflows induced by Sympy.
-
Model training status on master. See also Symbolic shapes work items tracker - Google Sheets
- aot_eager: -17 (new). We are changing the format of our reporting here, to now track the number of expected failures as recorded in the CI running on master. This means we always have an up-to-date tally of how well we are doing, without having to run a sweep by hand.
- I did a manual sweep with AOT Autograd refactor + cleanup, handle intermediate views of bases, use view replay, fix non-tensor input handling by bdhirsh · Pull Request #89532 · pytorch/pytorch · GitHub and got aot_eager: 156 out of 169 (+15 WoW) logs (sheet: aot_eager 12/17 w bdhirsh PR). This is a solid improvement, although the PR introduces accuracy failures that did not previously exist.
- inductor: we did not land any inductor related changes, so I did not rerun this sweep
-
OpInfo tests on symbolic shapes.
-
pytest test/test_proxy_tensor.py -k test_make_fx_symbolic_exhaustive
- 513 passed (+5 WoW), 522 skipped (no change), 227 xfailed (-3 WoW) -
pytest test/functorch/test_aotdispatch.py -k test_aot_autograd_symbolic_exhaustive
- 286 passed (+5 WoW), 142 skipped (+1 WoW), 203 xfailed (-5 WoW)
-
Notable bugs
- Despite overhauling ShapeEnv guard production in Dynamo two weeks ago, there were still more stragglers that had to be addressed this week. The main source of problems was a mismatch between when we added a tensor to GraphArgs (as it is an FX graph input) and when we allocated dynamic shapes for a tensor (so we may need to determine the source of its symbolic shape). This lead to more refactoring in Dynamo so that we could guarantee that whenever a tensor had symbolic shapes allocated for it, we also tracked it for the purposes of guard creation. This fixed all bugs, except one(!), which @ezyang has an open PR set for (involving more refactoring.)
- Assert for functional graph is FINALLY in master, and it caught more bugs in inductor lowerings when it landed. Hooray for more stringent asserts.
What’s made it to master this week?
ezyang
- Save and restore tracked_fakes
- DebugInterpreter works with symbolic shapes now, plus test
- Add utility for binding symbols based on arguments passed to placeholders
- Stop using GraphArgs for shape env guard source tracking
- Suppress RecursionError in sympy; fix logging
- Deeply rework WeakIdKeyDictionary
- Add macro C10_AS_INTARRAYREF_SLOW
jbschlosser
- ModuleInfo-based tests for AOTAutograd
- LSTM SymInt-aware changes & meta registration (cuDNN)
- LSTM SymInt-aware changes & meta registration (non-cuDNN CUDA)
- Hack get_nbytes() to return 0 for sparse tensors as workaround for functionalization
- Make functional inverse for squeeze_copy SymInt-aware
- Make at::outer SymInt-aware
voz
- Introduce guardexpr, aot autograd guarding of duplicates into torch._guards
- Add shape_env guards to tracing context
- Add tracing context, Integrate dynamo guards into torch._guards
bdhirsh
- fix default partitioner: save sizes instead of tensor for backward when possible
- aot_autograd: add assert for functional-only graph
- fix aot autograd for None fw inputs
- fix aliasing bug in pixel shuffle/unshuffle
What’s coming next?
By Person:
- voz: vacation
- ezyang: vacation
- bdhirsh: continue AOTAutograd v2 follow up
- jbschlosser: merge to master and burn down
- Chillee: inductor integration (apparently, Horace “has a few fixes” but they’re still not posted yet)
Our north star:
- All benchmark models are passing aot_eager and inductor training on branch
- Fallback implementation for custom operators without symbolic shape propagation, inferred by running fallback on real operators
- All OpInfo tests passing
- Dynamic shapes on by default for developers / users