State of symbolic shapes: Jul 9 edition
Previous update: State of symbolic shapes branch - #60 by ezyang
Executive summary
- Roadmap review for H2 2023. We had roadmap review for PyTorch teams last week. Dynamic shapes presence on the roadmaps looks like this: (1) we have a bunch of internal enablement plans which require dynamic shapes to be well supported, make sure we are on point here (Meta only), (2) we’re really interested in getting good inference performance on LLMs comparable to SOTA, e.g., llama (there’s some kv-cache / cuda graphs pieces here), (3) there’s still jagged/nested tensor work to do. On a more atomic level, the infra investments that dynamic shapes need to make are probably (a) two level guards for backwards shape guards, (b) improved accuracy/compile time debugging tools, (c) more aggressive symbolic reasoning enabled by translation validation, (d) obvious inductor compilation perf improvements, e.g., from split reductions, (e) Unbacked integers for eager mode. I’d also like to finally get vision_maskrcnn and detectron2 working on PT2, but LLMs take priority over this.
- Which operators specialize their inputs? In the old days, dynamic shapes enablement would typically fail because of missing meta functions. These days, things usually don’t fail, but you may end up having specialized and recompiling anyway. @anijain2305 has been working on sweeping operators to find out which arguments get specialized, to help folks have a better understanding of what will be dynamic versus not.
- Translation validation landed! Re-land: Turn translation validation on for tests and accuracy runs by default. was reverted last week, but has successfully relanded for real. This paved the way for simplification improvements including Value range refinement using uni-variate expressions., which are important because they reduce the number of guards we emit in the end.
- Notable bug fixes.
- We landed a few fixes to help fix issues in https://github.com/fxmarty/accelerated-pytorch-transformers-generation/: Generalize sympy.Rel test to sympy.logic.boolalg.Boolean, Allow for torch.sym_int to return int while tracing; there’s a few more coming too
- Notable new bugs. None of these are user bugs; they were all filed by the team
CI skips. -3, -1, -1, -2 (no change).
Training dashboard (as of dd6c38cb59). This week on HUD
Metric | Torchbench | Huggingface | TIMM models | Dynamic |
---|---|---|---|---|
Passrate | 91%, 58/64 → 89%, 57/64 | 98%, 45/46 | 100%, 60/60 → 97%, 58/60 | 100%, 8/8 → 88%, 7/8 |
Speedup | 1.08x → 1.11x | 1.58x → 1.60x | 1.21x → 1.20x | 1.30x |
Comptime | 78s → 97s | 152s → 124s | 134s → 178s | 78s → 40s |
Memory | 0.80x | 1.01x → 0.97x | 1.00x | 0.76x → 0.73x |
- vision_maskrcnn went back to failing, seems flaky.
- eca_botnext26ts_256 and mobilevit_s timed out due to translation validation being enabled. #104654 fixed it (to be visible in next perf run.) Compilation time increase also appears to be due to TV.
Inference dashboard (as of dd6c38cb59) This week on HUD
Metric | Torchbench | Huggingface | TIMM models | Dynamic |
---|---|---|---|---|
Passrate | 86%, 63/73 | 100%, 46/46 → 98%, 45/46 | 100%, 60/60 | 58%, 7/12 |
Speedup | 1.52x | 1.65x → 1.64x | 1.73x | 1.92x → 1.96x |
Comptime | 28s | 44s | 34s | 53s |
Memory | 0.67x | 1.11x | 0.84x | 0.86x |
- GPT2ForSequenceClassification is having some trouble across the board on all configurations; it’s currently failing accuracy.
What’s next?
- Edward: Keep helping HF on their llama optimization; two level guards for backwards