State of symbolic shapes: May 6 edition
Previous update: State of symbolic shapes branch - #52 by ezyang
Executive summary
-
Big grind for dynamic by default. @voz reports that enabling dynamic by default (or, more specifically, static by default but automatically turn on dynamic on recompile) has turned into a bit of a slog. Some of the reasons: (1) people have been adding new test files to the Dynamo test suite, but those tests have not been simultaneously run with dynamic shapes so we were running blind there, (2) it’s not risk free to recompile into dynamic shapes (our plan to derisk here is to remove dynamic_shapes=True but keep automatic dynamic false to start) and in practice some things broke on the full PyTorch test suite + dynamo, (3) there a lot of conditionals on
dynamic_shapes
scattered throughout the codebase and they all have to be rewritten not to do this. - Accuracy work on dynamic shapes should be unblocked by improved accuracy minifier. We’ve fixed one bug with the help of the new minifier infra described at Major updates to the after AOT accuracy minifier! ; hopefully we can nail more!
- Notable bug fixes.
CI skips. 0, 0, 0, -2 (no change WoW). We’re planning to use the improved accuracy minifier tools to nail the accuracy failures.
The dashboard (as of 675029a). This week on HUD.
Here are the top line metric changes from this week:
Metric | Torchbench | Huggingface | TIMM models |
---|---|---|---|
Passrate | 86%, 51/59 → 85%, 50/59 | 98%, 44/45 | 98%, 61/62 → 100%, 62/62 |
Speedup | 1.07x → 1.09x | 1.40x → 1.41x | 1.03x → 1.08x |
Comptime | 87s → 84s | 111s → 110s | 132s → 133s |
Memory | 0.86x | 1.06x | 1.01x |
The graphs suggest it was a bit of a roller coaster week:
- There’s a big regression for torchbench and timm on the 26th, for dynamic shapes only. It is probably due to Replace maybe_guard with statically_known by voznesenskym · Pull Request #99383 · pytorch/pytorch · GitHub, but there was a big improvement on torchbench yesterday, due to [dynamo] Hide guard_fail_hook behind a flag to improve cache lookup time (+10% DebertaV2) by anijain2305 · Pull Request #100590 · pytorch/pytorch · GitHub
- In timm, there was a major improvement from div16 changes for dyn shapes by voznesenskym · Pull Request #99930 · pytorch/pytorch · GitHub
- The passrate regression on torchbench is being tracked at hf_LongFormer failing eval with inductor and dynamic shapes · Issue #100812 · pytorch/pytorch · GitHub
What’s coming next?
- Voz: still grinding on dynamic by default, also moonlighting on improving Deberta performance
- Edward: nail some accuracy bugs, think about de-TensorImpl’ification or partial CUDA graphs application, maybe help on HF
- Horace: optimizing distributed collectives
- Joel: jagged tensor in inductor