State of symbolic shapes: Mar 12 edition
Previous update: State of symbolic shapes branch - #45 by ezyang
Executive summary
For your information:
- Training support is now properly fixed on master; modifying functorch config is no longer necessary. Some Inductor bugs were fixed, some are still pending.
-
specialize_int = False
(aka--unspecialize-int
) is now the default in CI (with some regressions), and soon will defaulted for regular users too. - Dynamic shapes are working for whole graph inference (e.g., BERT), but we often overspecialize when there are graph breaks. Two weeks ago we fixed overspecialization on size ints that were carried across graph breaks (when
specialize_int=False
); a similar fix for torch.Size carried across graph breaks is pending at Don’t specialize torch.Size with specialize_int = False - A reminder that if you are debugging overspecialization problems, you can slap a
torch._dynamo.mark_dynamic_dim(tensor, dim)
on the dimension you expect to be dynamic to see if it actually is dynamic or not. You’ll still have to diagnose the problem, we’ve recently been having success with extra logging on ShapeEnv, c.f. debug shape guards by avikchaudhuri · Pull Request #95848 · pytorch/pytorch · GitHub
Stuff that happened:
- Some symbolic shapes adjacent news:
- The case of flaky dynamo export tests. This is not exactly symbolic shapes related, but the debugging session was prompted while I was working on some symbolic shapes related changes. This bug is of interest to anyone working with Dynamo: Debugging story: The case of the flaky Dynamo export tests
- Fallthrough works correctly with PyOperator. If you use functorch control flow operators for export, you should be happy to know that PyOperator fallthrough now works (see Correctly resolve dispatch keys for PyOperator stack), which should solve some problems that Angela Yi was working on unblocking.
-
Tons of bugs fixed, still lots more bugs. Some highlights:
-
aot autograd: dont allow symint outputs to get tangents in the bw graph (fixes
'NoneType' object has no attribute '_has_symbolic_sizes_strides'
) -
Properly avoid wrapping numbers as tensors before backend (fixes
AssertionError: 1900: <class 'torch.Tensor'>, 256: <class 'int'>
) - Use maxint to bound integers. (helps CrystalDPR enablement get past MAX_INT tests on unbacked SymInts)
- Don’t guard on the exact int value on conversion to bool (overspecialization fix, affected multiple real models including OpenNMT)
- Fix training enablement in AOTAutograd (the promised training fix from last week)
-
aot autograd: dont allow symint outputs to get tangents in the bw graph (fixes
-
Export is getting a higher level constraints API. This adds a new
constraints
kwarg to export, which lets you declare things like “these dimensions are dynamic” (dynamic_dim(y, 0)
). See Add higher level export constraints api for the details. - GCP benchmarking. It’s here! Dynamic shapes benchmarking for dashboard is unblocked but hasn’t been started yet. This will probably be done concurrently with Improving performance dashboard latency / ROI - Google Docs
-
State of real world model enablement.
- LLaMa and InstructPix2Pix are added to torchbench: Add LLAMA by msaroufim · Pull Request #1446 · pytorch/benchmark · GitHub and Add instructpix2pix model by xuzhao9 · Pull Request #1451 · pytorch/benchmark · GitHub They haven’t been enabled for our regular benchmark runs yet, stay tuned!
- OpenNMT arange minimal repro runs without error and does not overspecialize. Vince was giving the whole model a try, but there are some problems with guard provenance: torch._dynamo.exc.Unsupported: dynamic shapes: arange · Issue #93468 · pytorch/pytorch · GitHub
- fmo-mt reported that huggingface BERT+TransformerEncoderLayer was failing with dynamic shapes. PR to fix is in review. Issue: Runtime error of TorchDynamo compiled model (unflatten) · Issue #95868 · pytorch/pytorch · GitHub Fix: Don’t specialize torch.Size with specialize_int = False #96419. This also would be fixed if we fixed the HF induced graph break, tracked at [Dynamo] HuggingFace transformers configuration_utils graph break workaround · Issue #96205 · pytorch/pytorch · GitHub
- Someone tried to run this on MNIST, and apparently dropout doesn’t work with symbolic sizes. Torch Dynamo backend compilation error with dynamic = True · Issue #96469 · pytorch/pytorch · GitHub
- If the optimized SDPA Transformer implementation in PyTorch is used, this doesn’t work with symbolic sizes. torch.compile(dynamic=True) does not work with a simple Transformer + embedding at inference · Issue #96414 · pytorch/pytorch · GitHub
- Some folks at IBM are interested in getting huggingface T5 working with dynamic shapes. They ran into some issues, both dynamic and non-dynamic related, many of which are now fixed. Not sure what the current status is. Issue: torch.compile fails when compiling a T5-style model with HF interfaces · Issue #96130 · pytorch/pytorch · GitHub
- No updates: MaskRCNN, Detectron2, wav2vec2, fused boolean mask update
-
How far away from dynamic shapes “shipping”? Soumith defines shipping as “we show meaningful end-to-end speed ups in training due to dynamic shapes” (possibly with s/training/inference/ as a milestone on the way to this goal), but at the same time, we are in the business of creating generic infrastructure that works for everything, not just cherry-picked models.
- On training, the metric here is difficult to define, because models don’t really have dynamic shapes. So to evaluate real world models, we would really have to take models that do have dynamic shapes in training (e.g., can be observed by the fact that otherwise they blow out the compile cache.) We don’t have this list of models: we need to sweep the benchmark suite and find out (although there’s a decent chance models involving bounding boxes like detectron2 and MaskRCNN are likely to be on this list). Plus, you have to make sure you’re running enough iterations, with real enough data, to actually exercise this data dependence! However, it’s possible things are working today already, we just don’t actually have the data.
- On inference, the value add of dynamic shapes is much easier to articulate: people want to vary batch size (for adaptive batching), sequence length (because text input is variable size) and image size (ditto), and we have already shown dynamic shapes works end-to-end for models like BERT. The flip side of the coin is if dynamic shapes is “shipped by default”; and this involves adaptively turning on dynamic shapes when we detect situations where we have potential to blow out our compile cache–this is probably a week or two of full time work to finish off.
The numbers:
-
Model status on master. See also Symbolic shapes work items tracker - Google Sheets
- aot_eager inference: -1 (-1 WoW). The regression is a new sympy RecursionError in vision_maskrcnn induced by
unspecialize_int=False
when runningreshape(torch.empty(s1, (s0 + 1)//2, 2), (s1, s0))
. Logs - aot_eager training: -2 (-2 WoW). The regression are also two sympy RecursionError induced by
unspecialize_int=False
. The botnet26t_256 looks like the same cause (reshape) as the vision_maskrcnn, but eca_botnext26ts_256 looks like some sort of modulus problem. Logs - inductor inference: -4 (+6 WoW, or unchanged, depending on how you count it). We regressed this stat with
specialize_int = False
, but we fixed most of the regression in time for the report. We did one trace: volo_d1_224 is now fixed, but convit_base is failing with a new error “TypeError: Cannot convert symbols to int” - inductor training: -9 (NEW!). Training is enabled in CI! The bulk of the current failures are either
'float' object has no attribute '_has_symbolic_sizes_strides'
(this is due to AOTAutograd sending graphs with SymFloat inputs, contrary to inductor’s input contract). There is one accuracy failure with rexnet_100; however, this model is known to be flaky accuracy with static shapes too.
- aot_eager inference: -1 (-1 WoW). The regression is a new sympy RecursionError in vision_maskrcnn induced by
-
Opinfo tests on symbolic shapes.
-
pytest test/test_proxy_tensor.py -k test_make_fx_symbolic_exhaustive
- 566 passed (+4 WoW), 524 skipped (+1 WoW), 192 xfailed (-3 WoW) -
pytest test/functorch/test_aotdispatch.py -k test_aot_autograd_symbolic_exhaustive
-
-
- Graph breaks on master. 0ish (unchanged). hf_Longformer and AllenaiLongformerBase are still diverging intermittently. Graph breaks will be in CI and we will resolve this one way or another.
-
Tracing cost of enabling dynamic shapes (aot_eager). Mean: 15s (-5s), Max: 168s (-72s WoW). Not really sure where the speedups are coming from, but we’ll take it!
- Repro command:
benchmarks/dynamo/run_delta.sh --backend aot_eager --devices cuda --cold-start-latency --ci
- Repro command:
Known problems
Unchanged from State of symbolic shapes branch - #45 by ezyang
What’s coming next?
- ezyang: burning down the biggest blocker bugs (right now, that’s float handling). Also need to setup perf CI.
- Chillee: unclear
- bdhirsh: per-dispatch key mode stacks and then torchquant/DTensor PT2 support
- voz: finishing up in flight work
- jbschlosser: enabling dynamic shapes for nested tensor