State of symbolic shapes branch

State of symbolic shapes: Jun 3 edition

Previous update: State of symbolic shapes branch - #56 by ezyang

Executive summary

  • This update covers two weeks, on account of memorial day holiday, and also most of the dynamic shapes crew was working on FSDP tracing.
  • PT Core Libraries offsite. PyTorch Core Libraries had an offsite. The big dynamic shapes relevant conversation we had was with @jbschlosser on nested tensor support in PT2. There will be an announce post coming soon about this working E2E; now we need to roll our sleeves up and put it into core PyTorch. One big resolution from our discussion was that it is not necessary to model jagged dimensions in our lowering stack: while having a jagged dimension, e.g., (B, H, W) where height and width can vary, is intuitive for end users, during lowering it is acceptable to lower this as an ordinary 1D dense tensor (BHW,) which contains a bunch of extra metadata that says how to reconstruct the other metadata. This is because our passes like autograd/etc do not care about the jagged structure of the tensor. Additionally, one theme was a lot of PyTorch library developers really liked doing development now in PT2, so there is a lot of interest in improving the “I want to add a new feature to PT, and I will use PT2 so I don’t have to write a CUDA kernel.” Dynamic shapes is pretty essential for kernel writers!
  • Drowning in bugs. Two broken models that are not in the benchmark suite which I’ve been eyeballing: Fine-tuning HuggingFace wav2vec 2.0 with `torch.compile` · Issue #101160 · pytorch/pytorch · GitHub (this one is broken in a lot of non-dynamic shape related ways) and [torch.compile] Name 'Ne' is not defined (Stable Diffusion) · Issue #101228 · pytorch/pytorch · GitHub (there a bunch of potential small changes which can fix it)
  • A reach out from FT users. Some folks using FT to do LLM inference are interested in what the long term state of dynamic shapes and PT2 will be; will PT2 be a viable alternative to FT? Today, our gap with FT is moderately significant, esp because we cannot use CUDA graphs with dynamic shapes. However, we hope that (1) with things like kvcache, you do not actually need dynamic shapes and (2) continual improvements to PT2, we will be a competitive and much more user friendly alternative to FT. Hopefully with our increasing focus on LLMs (thanks @drisspg and the rest of the blueberries folks) we should continue to make progress on this front.
  • Notable new issues.
  • Notable bug fixes.

CI skips. -1, -1, -1, -2 (+1, 0, -1, 0). hf_T5_generate and cm3leon_generate are now passing (though hf_T5_generate in a somewhat hacky way). New failure is nanogpt_generate which was previously failing even in static, new work item for us.

The dashboard (as of 8215468870). This fortnight on HUD

There is a discontinuity in speedup, due to a change in how we count speedup: we now (1) clamp model speeds to 1x (previously, a PT2 caused slowdown could depress overall speedup; this was the case for torchbench was revised from 1.12 to 1.15), and (2) we now include models that fail accuracy in geomean speedup as 1x (this depresses the geomean speedup, e.g., HuggingFace was revised from 1.48 to 1.45).

Metric Torchbench Huggingface TIMM models
Passrate 88%, 56/64 98%, 44/45 100%, 60/60
Speedup 1.15x → 1.16x 1.45x → 1.53x 1.20x → 1.22x
Comptime 79s 100s → 103s :small_red_triangle: 134s → 135s :small_red_triangle:
Memory 0.93x → 0.94x 0.97x → 1.00x 1.01x

Notes:

I probably ought to report inference numbers too but they are only being run twice a week and our latest set of improvements are not in an official benchmark run.

What’s next?

  • Voz: working on tracing FSDP
  • Edward: fixing bugs
1 Like