State of symbolic shapes branch

State of symbolic shapes branch: Sep 25 edition

The symbolic-shapes branch (PyTorch: Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub ; torchdynamo: [WIP branch] symbolic shape hacking by Chillee · Pull Request #1180 · pytorch/torchdynamo · GitHub) are long running branches in PyTorch/torchdynamo containing a large number of features and bugfixes related to dynamic shapes support in PyTorch. Previous update: State of symbolic shapes branch

Commit IDs at time of writing: pytorch 538031e232635ce1cd8c8d3ec54f4da14142d4d8; torchdynamo 3ddb46e873c2bdd1c59217a128b9b2b7af8696fe (unchanged)

Executive summary

We spent a lot of time this week merging changes to master, reducing the number of inserted lines in the branch by 50%. This merging process was not entirely smooth; more details in the next section. We did not make much progress in increasing operator coverage; however, Alban Desmaison and Anjali Chourdia are onboarding onto the project, and Nick Korovaiko is pivoting to working on model enablement, so we expect the pace to pick up soon.

We are currently working on updating the runbooks for symbolic shapes. Here are the relevant docs, in various states of up-datedness:

Previous branch diff: 74 files changed, 2880 insertions(+), 506 deletions(-)
Current branch diff: 60 files changed, 1464 insertions(+), 389 deletions(-)

Low light this week: master currently has a 5x regression on trace time due to https://github.com/pytorch/pytorch/pull/85239 ; a portion of this change is currently reverted on the branch by Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub

Retrospective on merge to master

In this section, I want to call out unexpected complications that occurred while merging PRs to master.

  • OpInfo for Slice - adding a new OpInfo for a torch.ops OpOverload caused a lot of spurious failures in preexisitng OpInfo tests, which assumed that all operators existed in torch.X namespace. To land this PR, we modified many of the offending OpInfo tests to ignore OpInfos; contrary to our initial expectations, this was not that painful (using -k slice to filter tests to only run our new slice OpInfo helped a lot.)
  • More SymFloat support - this triggered public bindings test, but due to Test public bindings in CI gives weird output on error · Issue #83393 · pytorch/pytorch · GitHub the CI failure is difficult to understand. Furthermore, many of us compile with distributed disabled; but the test does not run when distributed is disabled. We root caused the cause of the public bindings failure (it’s because CI retries the test) and also fixed the test to run even when distributed is disabled.
  • Correctly handle duplicate arguments to AOTAutograd - the initial version of this patch was fine, but the suggested refactor to move it further up the stack failed due to unclear invariants about whether or not None tensor arguments are allowed in AOTAutograd. A careful refactor of AOTAutograd here should help. Note that this patch was not strictly needed for dynamic shapes, but we had to fix this underlying bug to land Setup fake tensor and symbolic shapes once at beginning of AOTAutograd
  • as_strided symbolic support - This patch uncovered two problems. First, our solution to not require XLA module updates was not actually working, because call_fallback_fn utilized at::_ops interface, whose type eagerly updates when you change them to SymInt. We changed this function to strip SymInt (use call_fallback_fn_symint to preserve symints), and we were successfully able to land this PR without an XLA submodule update. Second, this PR triggered a Valgrind error on old versions of clang; after a day of investigating, we decided that this was a clang9 bug and disabled Valgrind on this version of clang. However, it also indicates that we perhaps shouldn’t be passing c10::SymInt by value, which is what we currently do.
  • Ported matmul compositeimplicitautograd impl into core also turned on Python dispatcher by default, which triggered a latent bug in linalg_lstsq we had to fix in Fixed memory issues in linalg_lstsq. The memory corruption was difficult to diagnose and it was eventually solved by reading the code and noticing improper overwriting of Tensor&. It would be good to audit the codebase for further instances of assigning over mutable tensor reference.
  • Symintifying slice ops triggered vulkan failures as a Vulkan test was passing MIN_INT as an argument, which is not representable as SymInt, though for no good reason. This suggests that perhaps we should also allow extremely negative integer constants. An added confounder on the diff was the PR author attempted to fix the problem by changing at::slice into at::native::slice, which caused more problems as the Vulkan slice implementation was bypassed.

Overall, it looks like we were able to address the root cause for most of the problems we encountered when landing PRs, which suggests that future landing should be smoother.

What should I expect to work?

The end-to-end command for BERT_pytorch is unchanged:

TORCHDYNAMO_DYNAMIC_SHAPES=1 AOT_DYNAMIC_SHAPES=1 python benchmarks/torchbench.py --only BERT_pytorch --accuracy-aot-nop --training

Note that if you are on a more recent commit of dynamo (which is OK if you’re not using inductor), the command line flags have changed. You will instead have to run:

TORCHDYNAMO_DYNAMIC_SHAPES=1 AOT_DYNAMIC_SHAPES=1 python benchmarks/torchbench.py --only BERT_pytorch --accuracy --backend aot_eager --training

We recommend using TORCH_SHOW_CPP_STACKTRACES=1 and TORCH_SHOW_CPP_STACKTRACES_WITH_LINENO=1 for more informative C++ stack traces.

What’s new in the branch this week?

As always, merged to master PRs are only listed in the master section.

What’s made it to master this week?

NB: cpp pytree was removed from the branch, without merging to master

What’s coming next?

  • Add SymInt to Scalar Add SymInt to Scalar by eellison · Pull Request #84958 · pytorch/pytorch · GitHub
  • Get testing for backwards and symbolic shapes working (assigned to @Chillee )
  • More merge to master. Subtasks from last week still apply:
    • Get min-cut partitioner working with symbolic shapes
    • Figure out what tests are failing on the branch
  • Full operator coverage for all benchmark models
    • Fallback implementation for custom operators without symbolic shape propagation, inferred by running fallback on real operators (can be done later)
2 Likes