State of symbolic shapes branch

State of symbolic-shapes branch: Sep 17 edition

The symbolic-shapes branch (PyTorch: Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub ; torchdynamo: [WIP branch] symbolic shape hacking by Chillee · Pull Request #1180 · pytorch/torchdynamo · GitHub) are long running branches in PyTorch/torchdynamo containing a large number of features and bugfixes related to dynamic shapes support in PyTorch.

Commit IDs at time of writing: pytorch e508e5ce3adaa3464f210e26e738e53d4ec4718c; torchdynamo 3ddb46e873c2bdd1c59217a128b9b2b7af8696fe

Executive summary

We started this branch three weeks ago, to move more quickly on adding dynamic shapes support to PyTorch, as getting past master CI was a bottleneck for our work. We made a lot of progress: this branch successfully runs pytorch_BERT forward/backward with the no-op AOTAutograd backend, and the forward mode is compileable by Inductor, producing a kernel that we have verified works with varying batch sizes without inducing recompilation.

From this work, we discovered tracing with dynamic shapes is quite slow. Over the last week, we made a lot of progress optimizing tracing time overhead (on devfair040, we went from 116.69s to 56.68s for E2E pytorch_BERT forwards-backwards on aot-nop), which stemmed from generating too many FX nodes for symbolic ints.

Based on our progress and discussions with Jason Ansel, we are now pivoting to enabling dynamic shapes by default for all our benchmark models, so that we can start exercising inductor on dynamic shapes. This involves merging the symbolic-shapes branch to master and blitzing operator support for torchbench.

Current branch diff: 74 files changed, 2880 insertions(+), 506 deletions(-)

What should I expect to work?

We are currently using the following command to test end-to-end:

TORCHDYNAMO_DYNAMIC_SHAPES=1 AOT_DYNAMIC_SHAPES=1 python benchmarks/torchbench.py --only BERT_pytorch --accuracy-aot-nop --training

What’s new in the branch this week?

NB: some changes were added to branch and then merged to master; those are listed in merged to master section only.

What’s made it to master this week?

What’s coming next?

High priority:

  • Merge into master. Items to merge are annotated with comments on the symbolic-shapes PR. This includes some subtasks:
    • Get min-cut partitioner working with symbolic shapes
    • Figure out what tests are failing on the branch
  • Get fake tensors and AOTAutograd working / Integrate inductor/dynamo dynamic shape analysis “properly” (related to sharing fake tensors) https://github.com/pytorch/pytorch/pull/85233
  • Full operator coverage for all benchmark models
  • Fallback implementation for custom operators without symbolic shape propagation, inferred by running fallback on real operators (can be done later)

Low priority:

  • Figure out why accuracy fails on E2E BERT
  • Get inductor working E2E with training on BERT
  • Get hf_BERT working (pytorch_BERT is different
    (Low priority atm) Get more models working
4 Likes

Awesome work!

I’m wondering if this project might be helpful to support dynamic shape for LTC, given that many ops have been migrated to support SymInts now.

There is a decent amount of LTC dynamic shapes logic that exists, but it was mostly added to appease the build/CI, we aren’t working on actually making LTC work end to end.

State of symbolic shapes branch: Sep 25 edition

The symbolic-shapes branch (PyTorch: Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub ; torchdynamo: [WIP branch] symbolic shape hacking by Chillee · Pull Request #1180 · pytorch/torchdynamo · GitHub) are long running branches in PyTorch/torchdynamo containing a large number of features and bugfixes related to dynamic shapes support in PyTorch. Previous update: State of symbolic shapes branch

Commit IDs at time of writing: pytorch 538031e232635ce1cd8c8d3ec54f4da14142d4d8; torchdynamo 3ddb46e873c2bdd1c59217a128b9b2b7af8696fe (unchanged)

Executive summary

We spent a lot of time this week merging changes to master, reducing the number of inserted lines in the branch by 50%. This merging process was not entirely smooth; more details in the next section. We did not make much progress in increasing operator coverage; however, Alban Desmaison and Anjali Chourdia are onboarding onto the project, and Nick Korovaiko is pivoting to working on model enablement, so we expect the pace to pick up soon.

We are currently working on updating the runbooks for symbolic shapes. Here are the relevant docs, in various states of up-datedness:

Previous branch diff: 74 files changed, 2880 insertions(+), 506 deletions(-)
Current branch diff: 60 files changed, 1464 insertions(+), 389 deletions(-)

Low light this week: master currently has a 5x regression on trace time due to https://github.com/pytorch/pytorch/pull/85239 ; a portion of this change is currently reverted on the branch by Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub

Retrospective on merge to master

In this section, I want to call out unexpected complications that occurred while merging PRs to master.

  • OpInfo for Slice - adding a new OpInfo for a torch.ops OpOverload caused a lot of spurious failures in preexisitng OpInfo tests, which assumed that all operators existed in torch.X namespace. To land this PR, we modified many of the offending OpInfo tests to ignore OpInfos; contrary to our initial expectations, this was not that painful (using -k slice to filter tests to only run our new slice OpInfo helped a lot.)
  • More SymFloat support - this triggered public bindings test, but due to Test public bindings in CI gives weird output on error · Issue #83393 · pytorch/pytorch · GitHub the CI failure is difficult to understand. Furthermore, many of us compile with distributed disabled; but the test does not run when distributed is disabled. We root caused the cause of the public bindings failure (it’s because CI retries the test) and also fixed the test to run even when distributed is disabled.
  • Correctly handle duplicate arguments to AOTAutograd - the initial version of this patch was fine, but the suggested refactor to move it further up the stack failed due to unclear invariants about whether or not None tensor arguments are allowed in AOTAutograd. A careful refactor of AOTAutograd here should help. Note that this patch was not strictly needed for dynamic shapes, but we had to fix this underlying bug to land Setup fake tensor and symbolic shapes once at beginning of AOTAutograd
  • as_strided symbolic support - This patch uncovered two problems. First, our solution to not require XLA module updates was not actually working, because call_fallback_fn utilized at::_ops interface, whose type eagerly updates when you change them to SymInt. We changed this function to strip SymInt (use call_fallback_fn_symint to preserve symints), and we were successfully able to land this PR without an XLA submodule update. Second, this PR triggered a Valgrind error on old versions of clang; after a day of investigating, we decided that this was a clang9 bug and disabled Valgrind on this version of clang. However, it also indicates that we perhaps shouldn’t be passing c10::SymInt by value, which is what we currently do.
  • Ported matmul compositeimplicitautograd impl into core also turned on Python dispatcher by default, which triggered a latent bug in linalg_lstsq we had to fix in Fixed memory issues in linalg_lstsq. The memory corruption was difficult to diagnose and it was eventually solved by reading the code and noticing improper overwriting of Tensor&. It would be good to audit the codebase for further instances of assigning over mutable tensor reference.
  • Symintifying slice ops triggered vulkan failures as a Vulkan test was passing MIN_INT as an argument, which is not representable as SymInt, though for no good reason. This suggests that perhaps we should also allow extremely negative integer constants. An added confounder on the diff was the PR author attempted to fix the problem by changing at::slice into at::native::slice, which caused more problems as the Vulkan slice implementation was bypassed.

Overall, it looks like we were able to address the root cause for most of the problems we encountered when landing PRs, which suggests that future landing should be smoother.

What should I expect to work?

The end-to-end command for BERT_pytorch is unchanged:

TORCHDYNAMO_DYNAMIC_SHAPES=1 AOT_DYNAMIC_SHAPES=1 python benchmarks/torchbench.py --only BERT_pytorch --accuracy-aot-nop --training

Note that if you are on a more recent commit of dynamo (which is OK if you’re not using inductor), the command line flags have changed. You will instead have to run:

TORCHDYNAMO_DYNAMIC_SHAPES=1 AOT_DYNAMIC_SHAPES=1 python benchmarks/torchbench.py --only BERT_pytorch --accuracy --backend aot_eager --training

We recommend using TORCH_SHOW_CPP_STACKTRACES=1 and TORCH_SHOW_CPP_STACKTRACES_WITH_LINENO=1 for more informative C++ stack traces.

What’s new in the branch this week?

As always, merged to master PRs are only listed in the master section.

What’s made it to master this week?

NB: cpp pytree was removed from the branch, without merging to master

What’s coming next?

  • Add SymInt to Scalar Add SymInt to Scalar by eellison · Pull Request #84958 · pytorch/pytorch · GitHub
  • Get testing for backwards and symbolic shapes working (assigned to @Chillee )
  • More merge to master. Subtasks from last week still apply:
    • Get min-cut partitioner working with symbolic shapes
    • Figure out what tests are failing on the branch
  • Full operator coverage for all benchmark models
    • Fallback implementation for custom operators without symbolic shape propagation, inferred by running fallback on real operators (can be done later)
2 Likes

State of symbolic shapes branch: Oct 2 edition

The symbolic-shapes branch (PyTorch: Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub ; torchdynamo: [WIP branch] symbolic shape hacking by Chillee · Pull Request #1180 · pytorch/torchdynamo · GitHub ) are long running branches in PyTorch/torchdynamo containing a large number of features and bugfixes related to dynamic shapes support in PyTorch. Previous update: State of symbolic shapes branch

Commit IDs at time of writing: pytorch eea837d4fa2f287c6d8633387c5fa7e7ed8dc9e9; torchdynamo 3ddb46e873c2bdd1c59217a128b9b2b7af8696fe (unchanged)

Executive summary

We made a lot of progress:

Runbooks:

Previous branch diff: 60 files changed, 1464 insertions(+), 389 deletions(-)
Current branch diff: 41 files changed, 1044 insertions(+), 448 deletions(-)

Retrospective on merge to master

What were unexpected complications when merging PRs to master?

  • Enable convolution_backward with bias and symints required adding support for another SymInt type (SymInt[]?). Fortunately, this was relatively symmetric with the other type support we’ve added, but it took Nick a day or so to do the first pieces, and there were two more subtle bits that Nick didn’t manage on his first try. First, aten/src/ATen/core/jit_type.h needed to be implemented with care: whenever a SymInt-including type (or container type) is written for getTypePtr_, you must instead implement it on getMaybeFakeTypePtr_ and ensure you either recursively pass on the fake template argument, or resolve it to be an int (if fake is true) and a SymInt (if fake is false). This lets us maintain the illusion for TorchScript that SymInt C++ types turn into conventional int-only schema. Second, aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h needed another case to allow for OptionalArray to be provided by an int-list IValue (covariant typing). We hope that this is the very last C++ type we need to add for SymInts (we now have SymInt, SymInt[], SymInt? and SymInt[]? which covers all used permutations of optional and array types on SymInt; there are no uses of int?[] in native_functions.yaml)
  • Consistent compute numel/contiguous strategy with SymInts involved refactoring some code that introduced an unsigned underflow bug; previously we had an expression int64_t i = dim() - 1, where dim() returned an int64_t, which was refactored into int64_t i = sizes.size() - 1, where sizes.size() returned a size_t (unsigned type). When size is zero, this underflows into a large number when then triggers signed conversion UB (as the very large unsigned underflow is not representable as a signed integer). The UB appears to resolve to a negative number as desired on most platforms, but not on Android. It is not clear why UBSAN did not catch the error on conventional platforms. This suggests we need more unit testing of TensorImpl (not done yet.)
  • Revert “Revert “Symintifying slice ops (#85196)”” was reverted for breaking Executorch fbcode-only builds. This is because Executorch was translating SymInt JIT type into SymInt C++ type, but the Executorch runtime does not (and should not) support SymInt. This was fixed by refactoring the codegen to offer an “int-only codegen mode” (by passing symint=False argument to relevant functions), and then having Executorch opt out of SymInt codegen.
  • Revert “Revert “Symintified mmm/addmm derivative formulas (#85794)”” was reverted because it broke Mac AOTAutograd tests. It was resolved by marking linalg.lu_solve as flaky, but it’s not clear what the root cause was.
  • removed compile cache and static argnums didn’t cause any problems on the way in, but we got complaints from users that they were relying on the cache, and so removing it broke their code, e.g., Functorch memory_efficient_fusion gives wrong output batch size · Issue #86020 · pytorch/pytorch · GitHub

Though not directly caused by the work on symbolic shapes, Edward had to spend some time this week fixing an ios SEV caused by insufficient CI for Metal backend: Insufficient CI for metal backend · Issue #84172 · pytorch/pytorch · GitHub This is relevant to us because we caused a similar ios SEV by changing the signature of C++ functions; however, we subsequently made changes that don’t require us to update C++ signatures, reducing the likelihood of a repeat SEV along these lines. There is also now a PR up to test Metal in OSS CI.

What should I expect to work?

The end-to-end command for BERT_pytorch is unchanged:

TORCHDYNAMO_DYNAMIC_SHAPES=1 AOT_DYNAMIC_SHAPES=1 python benchmarks/torchbench.py --only BERT_pytorch --accuracy --backend aot_eager --training

We recommend using TORCH_SHOW_CPP_STACKTRACES=1 . The other models listed in Operators that need symbolic support - Google Sheets are expected to work, but we do not have CI at the moment, so some minor regressions may occur during development.

What’s new in the branch this week?

As always, merged to master PRs are only listed in the master section.

What’s made it to master this week?

PRs marked with X have discussion in “Retrospective on merge to master”

What’s coming next?

  • Improved testing
    • Make sure we are testing meta functions that aren’t registered to the dispatcher
    • Improve stride testing coverage
  • More merge to master.
  • Get min-cut partitioner working with symbolic shapes (we are in better shape thanks to @wconstab but there are still problems)
  • Full operator coverage for all benchmark models (as infra is nearly all landed on master, it is a good time to start also merging these operators into master)
  • Consolidated guard strategy across torchdynamo/AOTAutograd/torchinductor (@Chillee is probably going to work on this)
  • Fallback implementation for custom operators without symbolic shape propagation, inferred by running fallback on real operators (can be done later)
2 Likes