State of symbolic shapes branch

ezyang · June 20, 2023, 3:11am

State of symbolic shapes: Jun 19 edition

Previous update: State of symbolic shapes branch - #58 by ezyang

Executive summary

Dynamic and blueberries in the benchmark suite as model sets. A model set (notated with the square brackets) is a subset of models from our existing benchmarks which we are aggregating separately to track something we care about. The Dynamic model set covers models which we expect dynamic shapes support to be relevant. Here is the current list, and some potential threats to validity which need follow up:

// _generate variants are good; they do E2E autoregressive
// generation and will induce varying context length.
cm3leon_generate
nanogpt_generate
hf_T5_generate
nanogpt_generate
// detection models are ok-ish; the good news is they call
// nonzero internally and exercise dynamic shapes that way,
// the bad news is we may not run enough iterations with
// varying data to get varying numbers of bounding boxes.
detectron2_fcos_r_50_fpn
vision_maskrcnn
// this recommendation model internally uses sparse tensors
// but once again its not clear that dynamic shapes is exercised
// on this sparsity
dlrm
// these language models are only running a single next
// word prediction, were NOT testing dynamic sequence length
// performance
llama
BERT_pytorch
hf_T5
// the GNN benchmarks only one run one batch so you
// arent actually triggering dynamism (and we didn't
// explicitly mark something as dynamic)
basic_gnn_edgecnn
basic_gnn_gcn
basic_gnn_gin
basic_gnn_sage

The blueberries set is meant to capture important LLM models, but it is very much a WIP right now.

Dynamic shapes by default. We made a lot of progress. Phase 1 is completely landed in master; Phase 2 has a PR open that is passing all CI tests: Enable automatic_dynamic_shapes by default by ezyang · Pull Request #103623 · pytorch/pytorch · GitHub After discussion with CK/Xiaodong we’re also going to try YOLO’ing internal enablement here too, after I add instrumentation that will help us detect when automatic_dynamic_shapes triggered. I also promised gchanan that I would rename automatic_dynamic_shapes to something more clear, maybe automatic_dynamic_on_recompile. PSA: you probably don’t want dynamic=True, esp if you’re running into bugs; use automatic_dynamic_shapes=True!
How to test for dynamic shapes without dynamic_shapes. So you want to add a new feature to PT2 but it doesn’t work with dynamic shapes. What can you do?
- Force specialization when it applies. All backends (e.g., inductor) are permitted to force extra specializations that were not strictly necessary. So if you know that you absolutely want your feature to apply, you can just specialize (e.g., by just int()'ing a SymInt). With dynamic shapes, you may end up with some extra int inputs in your FX graph that are actually static, but these are easy enough to ignore by testing if your input is Tensor or not. This is what we did for CUDA graphs.
- Test if there are torch.fx.experimental.symbolic_shapes.free_symbols. If everything is static, then there are no free symbols. This works best if you’re in some local situation where you need to decide to do something to a single tensor, but if you’re doing analysis on an FX graph it’s doable (you just may need to check multiple nodes.) This is what we did for layout optimization.
Notable bug fixes.
- Allow for sympy.Expr in tensor lowering in inductor (discovered when enabling automatic dynamic everywhere)
- Cast computation_node_input_size to int yolov3 automatic dynamic fix
- Don’t apply automatic_dynamic_shapes if we force tensor to be static major automatic dynamic bug fix; we accidentally were making parameters dynamic, which could occur when a module block was instantiated multiple times at different parameter sizes. This was probably the biggest source of failures with automatic dynamic.
Notable new issues.
- torch.compile error with dynamic=True: Found <class ‘sympy.core.relational.Unequality’>, which is not a supported top level IR node affects StableDiffusionPipeline from HuggingFace

CI skips. -3, -1, -1, -2 (-2, 0, 0, 0 WoW.) Regression is dlrm and hf_T5_generate from the switch of inference benchmarking from float32 to bfloat16, tracked at dlrm and hf_T5_generate fails aot_eager with bfloat16+dynamic_shapes · Issue #103760 · pytorch/pytorch · GitHub

Training dashboard (as of 7b3242d5f7). This week on HUD

Metric	Torchbench	Huggingface	TIMM models	Dynamic
Passrate	89%, 57/64	98%, 45/46	100%, 60/60	88%, 7/8
Speedup	1.13x → 1.11x	1.59x	1.18x → 1.19x	1.29x → 1.30x
Comptime	79s → 67s	103s → 99s	136s → 110s	33s → 31s
Memory	0.93x → 0.94x	1.00x	1.01x	1.59x

Not much to report. torchbench decrease appears to be due to a clear 10% regression on timm_efficientdet. However, it’s unclear how real this regression is because this model has always failed accuracy. timm is within noise.

Inference dashboard (as of 7b3242d5f7). This week on HUD

Inference was swapped to bfloat16 so… we don’t really have any point of comparison historically, because previously we were only running amp. Here’s the snapshot of data on the most recent run.

Metric	Torchbench	Huggingface	TIMM models	Dynamic
Passrate	88%, 63/72	100%, 46/46	100%, 60/60	58%, 7/12
Speedup	1.52x	1.64x	1.72x	1.92x
Comptime	24s	38s	30s	45s
Memory	0.82x	1.15x	1.06x	1.11x

Some thoughts from an apples-to-oranges comparison:

In absolute terms, the torchbench pass rate went down, but there are two models (I cannot easily tell from the display) which were removed from the suite entirely.
PT2 is more beneficial on bfloat16 than AMP, which is expected!
Memory compression is extremely bad. We still need to figure this out.

What’s next?

Edward: PSC, dynamic by default last mile and internal telemetry, maybe bug fixing if I can squeeze it in

Topic		Replies	Views
PyTorch 2.1: automatic dynamic shape compilation, torch.distributed.checkpoint, torch.compile + NumPy, torch.export prototype, and more! Release Announcements	0	1006	October 6, 2023
TorchInductor: a PyTorch-native Compiler with Define-by-Run IR and Symbolic Shapes compiler	46	66091	July 29, 2024
TorchDynamo: An Experiment in Dynamic Python Bytecode Transformation compiler	7	17159	March 9, 2023
A TorchDynamo trace time ablation study compiler	0	543	March 22, 2024
TorchInductor Update 6: CPU backend performance update and new features in PyTorch 2.1 compiler	0	1980	September 22, 2023

State of symbolic shapes branch

State of symbolic shapes: Jun 19 edition

Executive summary

What’s next?

Related topics