[RFC] Improve Dynamic Shapes Support Across Aten Operators and Expand Test Coverage

I’d like to propose a focused effort to improve dynamic shapes support across ATen operators and expand the corresponding test coverage. I would really appreciate any feedback on the scope, prioritization, or approach – and I am happy to adjust based on the community’s needs.

Motivation

When `torch.compile(dynamic=True)` works well, users get flexible compiled programs that avoid excessive recompilation as tensor dimensions change. In practice, however, many ATen operators still call concrete-shape APIs (`numel()`, `size()`, `sizes()`) instead of their symbolic counterparts (`sym_numel()`, `sym_size()`, `sym_sizes()`), or use `guard_int` / `guard_static_shape` in Inductor lowerings where symbolic arithmetic would suffice.

The result is that users hit unexpected errors like:

RuntimeError: Cannot call numel() on tensor with symbolic sizes/strides

or experience silent recompilation when guards are placed on dimensions that didn’t need them.

Four Failure Modes

My audit of the existing xfail lists and C++/Python operator code reveals that operators crash in four distinct ways. Understanding these modes is essential for choosing the right fix:

Mode Frequency Error Root Cause Fix Pattern
1: Direct Throw ~80% Cannot call numel/sizes/strides on tensor with symbolic sizes C++ .numel()/.sizes()/.size(dim)/.strides() returns concrete type, throws immediately on symbolic tensor Replace with sym_numel(), sym_sizes(), sym_size(dim)
2: Python SymInt Branch ~10% GuardOnDataDependentSymNode or unexpected guard Python .numel()/.size() ALREADY returns SymInt (no throw), but if val == 0: branches on it Wrap with guard_or_false() or torch._check()
3: TensorIterator Reject ~5% TensorIterator does not support symbolic shapes TensorIterator checks has_symbolic_sizes_strides_ flag directly Implement op in torch/_refs
4: Implicit SymInt->int64 ~5% Over-specializing guard or throw in guard_int() After fixing Mode 1, the returned SymInt flows into code expecting int64_t (loops, pointers, parallel_for) Restructure to avoid needing concrete value, or decompose

Most operators fall into Mode 1 and are straightforward to fix. Modes 2-4 require progressively more judgment but are well-understood patterns with existing precedent in the codebase.

Example: PR #182004 (fixing `cross_entropy_loss` for dynamic shapes),

I think a systematic pass would meaningfully improve the `torch.compile(dynamic=True)` experience for users.

Approach

I plan to land this as a series of small, focused PRs – each fixing a handful of related operators and their corresponding test xfails. This keeps each change easy to review, easy to revert if needed, and allows CI to catch any regressions early.

For each operator, the workflow is:

  1. Reproduce the failure with a minimal `torch.compile(dynamic=True)` script

  2. Fix the C++ operator (replace concrete APIs with symbolic equivalents) or the Inductor lowering (replace `guard_int` with symbolic arithmetic)

  3. Remove xfails from the relevant test suites

  4. Add targeted regression tests where coverage is thin, following the pattern established in PR #182004

Benefits

  • Better user experience : Fewer cryptic crashes and fewer silent recompilations when using `torch.compile(dynamic=True)`

  • Broader operator coverage : A more complete and reliable dynamic shapes story across the operator surface area

  • Improved CI signal : Converting xfails to passing tests gives earlier warning if future changes regress dynamic shape support

  • Incremental and low-risk : Small PRs mean easy review and safe rollback

How You Can Help

  • Prioritization feedback : If any of the listed operators are particularly important to your workloads, please let us know so we can tackle those first.

  • Additional operators : Here is our current list of operators [Operators List]. If you’ve encountered dynamic shape issues with operators not listed here, we’d love to hear about them.

  • Review bandwidth: We’d appreciate reviewer support as we work through the list. Each PR should be small and self-contained.

Thank you for reading! I am excited to chip away at this and would appreciate any thoughts or suggestions.

cc @morrison-turnansky , @groenenboomj

LGTM Thank you for working on this. !