State of symbolic shapes: Jun 19 edition
Previous update: State of symbolic shapes branch - #58 by ezyang
Executive summary
-
Dynamic and blueberries in the benchmark suite as model sets. A model set (notated with the square brackets) is a subset of models from our existing benchmarks which we are aggregating separately to track something we care about. The Dynamic model set covers models which we expect dynamic shapes support to be relevant. Here is the current list, and some potential threats to validity which need follow up:
The blueberries set is meant to capture important LLM models, but it is very much a WIP right now.// _generate variants are good; they do E2E autoregressive // generation and will induce varying context length. cm3leon_generate nanogpt_generate hf_T5_generate nanogpt_generate // detection models are ok-ish; the good news is they call // nonzero internally and exercise dynamic shapes that way, // the bad news is we may not run enough iterations with // varying data to get varying numbers of bounding boxes. detectron2_fcos_r_50_fpn vision_maskrcnn // this recommendation model internally uses sparse tensors // but once again its not clear that dynamic shapes is exercised // on this sparsity dlrm // these language models are only running a single next // word prediction, were NOT testing dynamic sequence length // performance llama BERT_pytorch hf_T5 // the GNN benchmarks only one run one batch so you // arent actually triggering dynamism (and we didn't // explicitly mark something as dynamic) basic_gnn_edgecnn basic_gnn_gcn basic_gnn_gin basic_gnn_sage
-
Dynamic shapes by default. We made a lot of progress. Phase 1 is completely landed in master; Phase 2 has a PR open that is passing all CI tests: Enable automatic_dynamic_shapes by default by ezyang · Pull Request #103623 · pytorch/pytorch · GitHub After discussion with CK/Xiaodong we’re also going to try YOLO’ing internal enablement here too, after I add instrumentation that will help us detect when
automatic_dynamic_shapes
triggered. I also promised gchanan that I would renameautomatic_dynamic_shapes
to something more clear, maybeautomatic_dynamic_on_recompile
. PSA: you probably don’t wantdynamic=True
, esp if you’re running into bugs; useautomatic_dynamic_shapes=True
! -
How to test for dynamic shapes without
dynamic_shapes
. So you want to add a new feature to PT2 but it doesn’t work with dynamic shapes. What can you do?- Force specialization when it applies. All backends (e.g., inductor) are permitted to force extra specializations that were not strictly necessary. So if you know that you absolutely want your feature to apply, you can just specialize (e.g., by just
int()'ing
a SymInt). With dynamic shapes, you may end up with some extra int inputs in your FX graph that are actually static, but these are easy enough to ignore by testing if your input is Tensor or not. This is what we did for CUDA graphs. - Test if there are
torch.fx.experimental.symbolic_shapes.free_symbols
. If everything is static, then there are no free symbols. This works best if you’re in some local situation where you need to decide to do something to a single tensor, but if you’re doing analysis on an FX graph it’s doable (you just may need to check multiple nodes.) This is what we did for layout optimization.
- Force specialization when it applies. All backends (e.g., inductor) are permitted to force extra specializations that were not strictly necessary. So if you know that you absolutely want your feature to apply, you can just specialize (e.g., by just
-
Notable bug fixes.
- Allow for sympy.Expr in tensor lowering in inductor (discovered when enabling automatic dynamic everywhere)
- Cast computation_node_input_size to int yolov3 automatic dynamic fix
- Don’t apply automatic_dynamic_shapes if we force tensor to be static major automatic dynamic bug fix; we accidentally were making parameters dynamic, which could occur when a module block was instantiated multiple times at different parameter sizes. This was probably the biggest source of failures with automatic dynamic.
-
Notable new issues.
-
torch.compile
error withdynamic=True
: Found <class ‘sympy.core.relational.Unequality’>, which is not a supported top level IR node affects StableDiffusionPipeline from HuggingFace
-
CI skips. -3, -1, -1, -2 (-2, 0, 0, 0 WoW.) Regression is dlrm and hf_T5_generate from the switch of inference benchmarking from float32 to bfloat16, tracked at dlrm and hf_T5_generate fails aot_eager with bfloat16+dynamic_shapes · Issue #103760 · pytorch/pytorch · GitHub
Training dashboard (as of 7b3242d5f7). This week on HUD
Metric | Torchbench | Huggingface | TIMM models | Dynamic |
---|---|---|---|---|
Passrate | 89%, 57/64 | 98%, 45/46 | 100%, 60/60 | 88%, 7/8 |
Speedup | 1.13x → 1.11x | 1.59x | 1.18x → 1.19x | 1.29x → 1.30x |
Comptime | 79s → 67s | 103s → 99s | 136s → 110s | 33s → 31s |
Memory | 0.93x → 0.94x | 1.00x | 1.01x | 1.59x |
Not much to report. torchbench decrease appears to be due to a clear 10% regression on timm_efficientdet. However, it’s unclear how real this regression is because this model has always failed accuracy. timm is within noise.
Inference dashboard (as of 7b3242d5f7). This week on HUD
Inference was swapped to bfloat16 so… we don’t really have any point of comparison historically, because previously we were only running amp. Here’s the snapshot of data on the most recent run.
Metric | Torchbench | Huggingface | TIMM models | Dynamic |
---|---|---|---|---|
Passrate | 88%, 63/72 | 100%, 46/46 | 100%, 60/60 | 58%, 7/12 |
Speedup | 1.52x | 1.64x | 1.72x | 1.92x |
Comptime | 24s | 38s | 30s | 45s |
Memory | 0.82x | 1.15x | 1.06x | 1.11x |
Some thoughts from an apples-to-oranges comparison:
- In absolute terms, the torchbench pass rate went down, but there are two models (I cannot easily tell from the display) which were removed from the suite entirely.
- PT2 is more beneficial on bfloat16 than AMP, which is expected!
- Memory compression is extremely bad. We still need to figure this out.
What’s next?
- Edward: PSC, dynamic by default last mile and internal telemetry, maybe bug fixing if I can squeeze it in