State of symbolic shapes: Jan 20 edition
Previous update: State of symbolic shapes branch - #31 by ezyang
Executive summary
Volume-wise, there wasn’t that much activity, but there were three landed PRs that had a disproportionate effect on our metrics. First, we landed Brian’s AOTAutograd fixes which fixed a large number of assert faiulres; second, Horace is finally back to dynamic shapes and landed a PR that fixes a few Inductor inference dynamic shapes problem (fixing inductor enough that we can start reporting master stats again); finally, I noticed an accounting problem for our stats, where many of the failures we were reporting actually had nothing to do with dynamic shapes. Overall, this pushed our delta for aot_eager to TWO (one coverage, one timeout). This is fantastic, and we are turning our attention to other areas of dynamic shapes support:
- Brian is spearheading tracking the number of extra graph breaks caused by dynamic shapes (tracked on “extra graph breaks” sheet at Symbolic shapes work items tracker - Google Sheets ). For now, we are only looking at torchbench. We don’t have a consolidated statistic to track this week over week yet but we will soon.
- Horace is grinding down inductor inference failures with dynamic shapes (tracked on “inductor eval” sheet at Symbolic shapes work items tracker - Google Sheets ; the horace sheet is with Horace’s WIP stack). We are in the progress of transitioning regular CI coverage from testing aot_eager training to testing inductor inference, which will allow us to give comparable metrics to aot_eager on master (this week we will have a one-off metric here).
- Voz is working on improving our tracing time, which is called out by both OSS and internal users as a problem, and is a big problem for dynamic shapes, which is ostensibly about improving compilation times. We are also in the process of preparing a consolidated statistic to track week over week.
We also need to start working on inductor training support, which is has its own unique challenges. We’ve also been discussing nested tensor / jagged tensor compilation with inductor (e.g., PyTorch Composability Sync: Nested/Jagged Tensor compilation - YouTube ). We are deprioritizing work to characterize how dynamic/static our benchmark suite is, and instead indeed to evaluate this ad hoc on use cases where users come to us and say “hey, I need this to be dynamic.” One example is this script from Timothee Lacroix: Redirecting... (Meta only). There is some discussion about needing a more fine-grained way to turn on dynamic shapes (e.g., instead of turning it on for ALL local tensors, only turning it on for tensor dimensions that are known to be dynamic.)
Status report:
-
Model training status on master. See also Symbolic shapes work items tracker - Google Sheets
- aot_eager: -2 (-13 WoW). This reduction is primarily due to (1) https://github.com/pytorch/pytorch/pull/89532 landing and (2) us realizing that many of our failing models also failed with static shapes Reclassify some dynamic aot_eager failures as static failures by ezyang · Pull Request #92376 · pytorch/pytorch · GitHub . There are literally only two models left: one is a coverage problem, the other is a tracing latency problem.
- inductor eval: TODO.
- On Horace’s stack, the numbers are slightly improved: 100/187 passing. Hoping to get this on master soon.
-
OpInfo tests on symbolic shapes.
-
pytest test/test_proxy_tensor.py -k test_make_fx_symbolic_exhaustive
- 542 passed (+26 WoW), 523 skipped (+1 WoW), 201 xfailed (-13 WoW) -
pytest test/functorch/test_aotdispatch.py -k test_aot_autograd_symbolic_exhaustive
- 297 passed (+6 WoW), 143 skipped (no change), 193 xfailed (-4 WoW)
-
What’s made it to master since last time?
ezyang
- Make run_dynamic_ci_skips_only.sh more generic
- Do not specialize int/float with dynamic=True
- Remove dead TORCHDYNAMO_DYNAMIC_SHAPES print
- Make log parser work on inference runs too
- Reclassify some dynamic aot_eager failures as static failures
- Make clone-deps checkout correct Triton hash
- Check in some utility scripts for running dynamic shapes sweeps
- Fix AOTAutograd 2.0 perf regression involving as_strided
- Enable -Werror=bool-operation (needed for safe use of bitwise ops on SymBool in C++)
- Introduce sym_min and sym_max
- Update dynamic skips after #92076
- Reland “AOT Autograd refactor + cleanup, handle intermediate views of bases, use view replay, fix non-tensor input handling”
- Support --dynamic-ci-skips
voz (nothing dynamic shapes related)
Chillee
jbschlosser (nothing; just got back from PTO)
bdhirsh
nkaretnikov
What’s coming next?
- ezyang: CI stuff, then probably trying to get inductor training going on master
- Chillee: hosing down inductor inference errors
- bdhirsh: working on dynamo graph breaks; also working on AOTDispatch enhancements for torchquant and nested tensor
- jbschlosser: not sure yet
- nkaretnikov: enabling dynamic shapes testing on inductor
Our north star: Dynamic shapes at feature parity with static shapes for PT2 release (but NOT turned on by default)