State of symbolic shapes: Jan 29 edition
Previous update: State of symbolic shapes branch - #37 by ezyang
Executive summary
We are two weeks away from branch cut for PyTorch 2.0. Dynamic shapes has enough on master that we are non-blocking for the release: there is still a lot we want to get in before the release, but the most important stuff is landed. In particular, Horace landed more inference fixes and we also have enabled CI for Inductor inference on master. There is a PR in progress for training https://github.com/pytorch/pytorch/pull/93059 but our general thinking is that dynamic shapes is more important for inference (where you are more likely to want to vary sequence length) as opposed to training.
Horace’s order of operations is: (1) basic training support, (2) inference performance on autoregressive generation, (3) other stuff; Edward will just be working on general enablement here and there. Voz is still working on trace time performance (some improvements landed, and some very promising work on short circuiting meta computation at https://github.com/pytorch/pytorch/pull/93118 could also lead to speed wins with static shapes too.) Brian and Joel have still been working on Dynamo graph breaks, although none of the PRs from this workstream have landed yet (still working out Dynamo code review.)
- Models outside of the benchmark suite. We took some fun models out for a spin last week. wav2vec2 is successfully running inference under torch.compile with dynamic shapes. maskrcnn is not in as good a state, but a lot of its blockers are things we know about and have been working on.
- Accuracy failures. Background_Matting and LearningToPaint are failing accuracy with inductor inference with dynamic shapes, but not without dynamic shapes. These are priority to fix.
- Documentation. This got its own post, but in case you missed it: there is now a manual for dynamic shapes enablement: The dynamic shapes manual - Google Docs Let us know if there’s anything you’d like to see in it.
- How dynamic is the benchmark suite? Edward ran an experiment where he printed out the number of unique symbolic variables after tracing. Interestingly, most models only have one unique symbolic variable (likely the batch dimension.)
- Why is tracing so slow? Voz added a bunch of extra instrumentation to help better characterize what exactly we’re doing when tracing, and Horace ran some experiments. One of the more interesting results was that in hf_Bert inference, Dynamo produces a graph with 570 nodes, but after AOTAutograd this balloons to 1528 nodes. Making matters worse, fake tensor is invoked 47000 times (16k occurring before AOTAutograd, 31k after.) This is what pushed us in the direction of reducing fake tensor overhead with meta function short circuiting. Hacky experiments by Voz show we can get a 50-70% speedup this way. Also, pytree is slow, we are eagerly awaiting [WIP][POC][pytree] Use OpTree for PyTree manipulation by XuehaiPan · Pull Request #92679 · pytorch/pytorch · GitHub
- Model training status on master. See also Symbolic shapes work items tracker - Google Sheets
- aot_eager inference: -3 (NEW!). It turns out there are some models that are failing static shapes aot_eager training but not inference. These appear to be failing for straightforward coverage reasons and should be easily fixable.
- aot_eager training: -1 (-1 WoW). The only remaining error is a timeout, which we hope will be resolved by trace time performance work.
- inductor inference: -16 (NEW!). Doing a more direct comparison against Horace’s stack from last week, a manual sweep gives 143/160 (+43 WoW)
- inductor training: with Horace’s patch, 49/129 (NEW!)
- OpInfo tests on symbolic shapes.
pytest test/test_proxy_tensor.py -k test_make_fx_symbolic_exhaustive- 547 passed (+5 WoW), 523 skipped (no change), 196 xfailed (-5 WoW)pytest test/functorch/test_aotdispatch.py -k test_aot_autograd_symbolic_exhaustive- 302 passed (+5 WoW), 143 skipped, 828 deselected, 188 xfailed (-5 WoW)
What’s made it to master since last time?
ezyang
- Make CPU inductor work with dynamic shapes
- Get rid of dedicated inductor dynamic_shapes config
- Make meshgrid support symbolic shapes
- Properly compute device for elementwise operations with CPU scalar tensor
- Fix some silly Inductor bugs
- Mark crossvit_9_240 as passing dynamic=True
- Add --timing and --explain to CI runs
- Make TensorIterator give better error message for symbolic tensors
- Run all of the timm models shards in the periodic
- Change ciflow/inductor to test inductor inference with dynamic shapes
- Add dynamic shapes aot_eager to periodic
- Make CI_SKIPS into a consolidated dict
- Make --inductor imply --backend inductor
- Switch CI exclusions to use exact match.
- Forward fix: restore sebotnet33ts_256 aot_eager skip
- Update aot_eager CI failures
- Add periodic job to test aot_eager on benchmarks suite.
- Refactor test_inductor_benchmark into test_single_dynamo_benchmark helper
- Log accuracy failure in more cases
- Add helpers for running tests and then putting them in a CSV
- Reenable mobilevit_s in CI, seems to pass
- Rename Makefile_dashboard to Makefile
Chillee
- Replace IndexingDiv with FloorDiv in Inductor
- Some more inductor fixes for symbolic shapes
- Fixed virtualized import and typing rule
- A bunch of fixes for Inductor + dynamic shapes enablement
- Guard solve behind mod for symbolic shapes
- Change convolution to use symbolic shapes for propagation
voz
- Fix positional issues in dedup guards
- Add @count util to torch, use it to track benchmark stats
- lru_cache shape expansion (20-25% speedup on local bench)
- Add
--timingflag, phase timing to @dynamo_timed
jbschlosser
What’s coming next?
- ezyang: inductor inference accuracy failures, popcorn enablement
- Chillee: inductor training, autoregressive generation performance
- bdhirsh: dynamo graph breaks, inference functionalization (this looks like we will still need to put
copy_in the graph) - jbschlosser: dynamo graph breaks
- nkaretnikov: finally getting the floor div patch series in (it fixes real bugs!)
Our north star: Dynamic shapes at feature parity with static shapes (but NOT turned on by default.)