State of symbolic shapes branch

State of symbolic shapes: Jan 29 edition

Previous update: State of symbolic shapes branch - #37 by ezyang

Executive summary

We are two weeks away from branch cut for PyTorch 2.0. Dynamic shapes has enough on master that we are non-blocking for the release: there is still a lot we want to get in before the release, but the most important stuff is landed. In particular, Horace landed more inference fixes and we also have enabled CI for Inductor inference on master. There is a PR in progress for training https://github.com/pytorch/pytorch/pull/93059 but our general thinking is that dynamic shapes is more important for inference (where you are more likely to want to vary sequence length) as opposed to training.

Horace’s order of operations is: (1) basic training support, (2) inference performance on autoregressive generation, (3) other stuff; Edward will just be working on general enablement here and there. Voz is still working on trace time performance (some improvements landed, and some very promising work on short circuiting meta computation at https://github.com/pytorch/pytorch/pull/93118 could also lead to speed wins with static shapes too.) Brian and Joel have still been working on Dynamo graph breaks, although none of the PRs from this workstream have landed yet (still working out Dynamo code review.)

  • Models outside of the benchmark suite. We took some fun models out for a spin last week. wav2vec2 is successfully running inference under torch.compile with dynamic shapes. maskrcnn is not in as good a state, but a lot of its blockers are things we know about and have been working on.
  • Accuracy failures. Background_Matting and LearningToPaint are failing accuracy with inductor inference with dynamic shapes, but not without dynamic shapes. These are priority to fix.
  • Documentation. This got its own post, but in case you missed it: there is now a manual for dynamic shapes enablement: The dynamic shapes manual - Google Docs Let us know if there’s anything you’d like to see in it.
  • How dynamic is the benchmark suite? Edward ran an experiment where he printed out the number of unique symbolic variables after tracing. Interestingly, most models only have one unique symbolic variable (likely the batch dimension.)
  • Why is tracing so slow? Voz added a bunch of extra instrumentation to help better characterize what exactly we’re doing when tracing, and Horace ran some experiments. One of the more interesting results was that in hf_Bert inference, Dynamo produces a graph with 570 nodes, but after AOTAutograd this balloons to 1528 nodes. Making matters worse, fake tensor is invoked 47000 times (16k occurring before AOTAutograd, 31k after.) This is what pushed us in the direction of reducing fake tensor overhead with meta function short circuiting. Hacky experiments by Voz show we can get a 50-70% speedup this way. Also, pytree is slow, we are eagerly awaiting [WIP][POC][pytree] Use OpTree for PyTree manipulation by XuehaiPan · Pull Request #92679 · pytorch/pytorch · GitHub
  • Model training status on master. See also Symbolic shapes work items tracker - Google Sheets
    • aot_eager inference: -3 (NEW!). It turns out there are some models that are failing static shapes aot_eager training but not inference. These appear to be failing for straightforward coverage reasons and should be easily fixable.
    • aot_eager training: -1 (-1 WoW). The only remaining error is a timeout, which we hope will be resolved by trace time performance work.
    • inductor inference: -16 (NEW!). Doing a more direct comparison against Horace’s stack from last week, a manual sweep gives 143/160 (+43 WoW)
    • inductor training: with Horace’s patch, 49/129 (NEW!)
  • OpInfo tests on symbolic shapes.
    • pytest test/test_proxy_tensor.py -k test_make_fx_symbolic_exhaustive - 547 passed (+5 WoW), 523 skipped (no change), 196 xfailed (-5 WoW)
    • pytest test/functorch/test_aotdispatch.py -k test_aot_autograd_symbolic_exhaustive - 302 passed (+5 WoW), 143 skipped, 828 deselected, 188 xfailed (-5 WoW)

What’s made it to master since last time?

ezyang

Chillee

voz

jbschlosser

What’s coming next?

  • ezyang: inductor inference accuracy failures, popcorn enablement
  • Chillee: inductor training, autoregressive generation performance
  • bdhirsh: dynamo graph breaks, inference functionalization (this looks like we will still need to put copy_ in the graph)
  • jbschlosser: dynamo graph breaks
  • nkaretnikov: finally getting the floor div patch series in (it fixes real bugs!)

Our north star: Dynamic shapes at feature parity with static shapes (but NOT turned on by default.)