State of symbolic shapes branch

State of symbolic shapes: May 14 edition

Previous update: State of symbolic shapes branch - #54 by ezyang

Executive summary

  • New benchmarks. Edward’s focus on this week was adding new benchmarks. We now have autoregressive generation benchmarks (hf_T5_generate, cm3leon_generate, llama is now enabled in our benchmark suite, and coming soon are some new GNN benchmarks as well as working vision_maskrcnn (having solved its eager mode determinism problems.) This finally makes good on our promise last month to add a dynamic-shapes oriented set of benchmarks for more fine-grained tracking. Once these benchmarks stabilize we will report for these models specifically in our updates. What’s still missing? We’d also like to get detectron2 working, and it would be good to have llama_generate and gpt2_generate (broken due to HF bug.) There’s also not much representation from generative image models at the moment. We also want to be tracking HuggingFace’s compile experiments closely. The new benchmarks are also quite broken with torch.compile, more bug fixing required!
  • Translation validation. Yukio has posted an initial translation validator for our symbolic shape reasoning. How exciting!
  • Hint sets. One hypothesis we have about the perf regression from State of symbolic shapes branch - #54 by ezyang is that we are nixing optional optimizations that, actually in practice, would be OK to apply. (A simple example is 32-bit indexing, though Natalia tells us that this one is unlikely to matter.) If this is true, we could potentially allow for more optimizations by allowing for hint sets: rather than having only one hint when inductor compiles a graph, we maintain a hint for EVERY concrete instantiation of the size vars we have seen so far. Optional optimizations that generalize to all instantiations continue to be allowed. No one is working on this yet, and we don’t know how much it will buy.
  • Towards more determinism. Check out this post A pure Python implementation of roi_align that looks just like its CUDA kernel which discusses how we got vision_maskrcnn passing the eager bitwise determinism check.
  • Notable bug fixes.

CI skips. -2, -1, 0, -2 (-2, -1, 0, 0 WoW). The new failures are from new models (cm3leon_generate, hf_T5_generate and llama) which we’ve enabled in torchbench.

The dashboard (as of e406125). This week on HUD.

Metric Torchbench Huggingface TIMM models
Passrate 85%, 50/59 → 90%, 53/59 98%, 44/45 100%, 61/61
Speedup 1.09x 1.41x → 1.39x :small_red_triangle_down: 1.08x → 1.15x
Comptime 84s → 90s :small_red_triangle: 110s → 118s :small_red_triangle: 133s → 134s :small_red_triangle:
Memory 0.86x → 0.87x 1.06x → 1.00x :small_red_triangle_down: 1.01x → 0.97x :small_red_triangle_down:

Some of this is just movement over all models in torchbench. However, in dynamic shapes specifically:

What’s coming next?

  • Voz: dynamic by default
  • Edward: Fix bugs until I die, probably fix the minifier again