State of symbolic shapes branch

State of symbolic shapes branch: Oct 22 edition

The symbolic-shapes branch (PyTorch: Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub ) is a long running branch containing a large number of features and bugfixes related to dynamic shapes support in PyTorch. Previous update: State of symbolic shapes branch - #7 by ezyang

Commit ID at time of writing: 5f11aa560bdb406e6826e355edf19bda6174d63f

Executive summary

We focused on model enablement this week. We’re starting to hit the last mile on training, which means our pace is slowing as we spend time fixing more complicated bugs, though the pace on the branch is still faster than the rate we are merging code to master.

  • This major bug deserves a bullet on its own: we identified that torchdynamo was over-suppressing errors (even when dynamic shapes was off), artificially inflating the PASS rate of the dynamic shapes dashboard. This was fixed in [dynamo] Unify raise_on_* config to suppress_errors and raise by default by ezyang · Pull Request #87440 · pytorch/pytorch · GitHub . This update post will report training status with errors suppressed, but in subsequent update posts we will stop suppressing errors for a more accurate depiction of our status. (On the plus side; error suppression would not have affected inductor speedup numbers, so even if we were skipping blocks to compile, we are still getting good speedups.)
  • Model training status on symbolic-shapes. (run by @ezyang); see also Operators that need symbolic support - Google Sheets for hand audited results (not completely up to date atm, but actively used for task assignment). This run was done prior to fixing error suppression, so they suppress errors.
    • torchbench: 44 out of 55 (+8 WoW)
    • huggingface: 34 out of 44 (new)
    • timm: 38 out of 62 (new)
  • Model inference status on master (run by @ezyang); these runs are all WITHOUT suppressing errors. Pass rate is artificially depressed as PyTorch the run was done with was built without numpy support.
    • torchbench inductor: 27 out of 46 (+18 WoW)
    • torchbench aot_eager: 35 out of 46 (+20 WoW)
    • torchbench aot_eager, without dynamic shapes (baseline): 39 out of 46 (new)
  • OpInfo tests on symbolic-shapes (@ezyang)

Previous branch diff: 30 files changed, 1209 insertions(+), 225 deletions(-)
Current branch diff: 70 files changed, 1681 insertions(+), 430 deletions(-)

Notable bug fixes

  • [HACK] Don’t clone, instead don’t clobber proxies. This is a multi-week saga. The beginning of the story is that we noticed partitioning was failing, because occasionally size computation in the forward pass depended on size expressions on backward inputs. @wconstab identified that this was because we were overwriting proxies when a SymIntNode was reused on multiple tensors. To fix this, Will introduced a clone when we set SymInts into tensors (Clone symint on set_sizes_and_strides by wconstab · Pull Request #85878 · pytorch/pytorch · GitHub), so that when we assign proxies after a tensor returned by an operator, we would always have fresh SymIntNodes, preventing overwriting. However, after this fix landed, we noticed that occasionally SymIntNodes would show up that didn’t have any proxies at all! Brian fixed one particular instance of this (functionalization: skip meta reference compute for aot autograd), but there were others. We still don’t know how this situation arises (the minifier didn’t work), but @ezyang is testing an alternate fix on the branch, where instead of cloning SymInts, we simply avoid overwriting proxies if one already exists. This is a hack because it means we can’t faithfully report user programs (e.g., if you write r = torch.add(x, y); s = r.size(0), this might end up reporting as s = x.size(0)), but with the few days of testing it seems to have fixed the problem and not caused any other regressions on benchmark models.
  • Fixed FakeTensor not calling CompositeImplicitAutograd decomps sometimes. @anjali411 noticed that one of her models was stuck in an infinite loop involving _to_copy decomps. She and @ngimel unsuccessfully tried to debug it. @Chillee eventually pinned down the root cause by tracing through each decomposition the loop went through, and breaking the loop with a small change to fake tensor. This spawned a discussion about our use of decomposition tables being too complicated, which @SherlockNoMad has been working to refactor.
  • fix minifier and minor improvements to minifier. In many situations, the minifier would fail to minify errors involving dynamic shapes. These fixes make more programs minify successfully; many bugs fixed this week were fixed with help from the updated minifier.
  • Added some hacks to unblock hf_reformer. This bug manifested as another infinite loop in Sympy. @Chillee put in some hacks to fix it; I don’t really understand how it worked.
  • Support symbolic as_strided_, and make unsqueeze_ correct. The bug here is pretty simple: unsqueeze_ didn’t actually modify the tensor inplace. It was very difficult to diagnose without the minifier; after minification, it became clear that there was some unsqueeze_ shenanigans. @albanD helped by remembering that unsqueeze_ was implemented incorrectly on the branch. Going forward, I request people avoid committing known wrong logic to the branch.
  • Correct dtype calc when arange gets SymInt as argument. This was tricky to diagnose because arange passes fine (with an incorrect return dtype), there is a graph break, and then the program finally fails in another subgraph. The bug itself was also subtle, and we added a lint rule to catch future occurrences of it (Audit for error prone isinstance int/float and add lint)
  • properly return NotImplemented on binOp(symInt, tensor) calls. This bug was tricky to diagnose because understanding why it’s gone wrong requires knowing how reversible magic methods in Python work. First, Python attempts __mul__, and if it raises a TypeError or returns NotImplemented, it will silently swallow the error and try __rmul__. Previously this was accidentally working, but changes @bdhirsh made this no longer work.
  • Add support for torch.tensor(scalar_sym{int/float}). The short term fix involves just guarding on the contents of the SymInt, but @anjali411 and @ezyang discussed a more permanent solution, which involves introducing a new operator int_tensor(SymInt[] data) which can be used to directly propagate small amounts of tensor data through without guarding.
  • Bugfixes for metas. A lot of meta operations are incorrectly implemented because they use .to(memory_format=...), which doesn’t do what you think it does. An old issue `x.to(memory_format=torch.contiguous_format)` does not always return a contiguous tensor · Issue #62027 · pytorch/pytorch · GitHub has been revived and we are discussing what to do about this at the API level.
  • Improve argument printing. This is a dumb cosmetic problem that @ezyang eventually got fed up with and fixed. Sometimes, we would report an error like “Argument types did not match: expected tuple of ints but got tuple.” This is because argument parser did not consistently report what the inner type of a tuple/list was that caused the problem. This is now fixed.
  • Convert torch.Size() argument to sym size in test_proxy_tensor. Some bugs in factory functions were not caught because while we symintify input tensors, we don’t symintify inputs which could be tensors. @albanD worked to beef up our testing here, taking advantage of the fact that OpInfos explicitly denote Size arguments with the Size tuple subclass. However, we also need to still increase testing for int arguments that aren’t Size.
  • Add inplace function testing to test_proxy_tensor. We didn’t have any testing for inplace operators at all; in fact, OpInfo inplace testing is very poorly exercised. @albanD added more coverage here.
  • [discussion] fix for aot autograd outputs that dont require grad. This is a kind of embarrassing bug in AOTAutograd where we just didn’t think hard enough about what to do in various edge cases involving requires_grad. @soumith suggested that we should do a more careful audit of AOTAutograd for other edge cases along these lines.
  • symintify nll loss fns (#86915) by anjali411 · Pull Request #87095 · pytorch/pytorch · GitHub uncovered a bug in our default argument handling (specifically, we didn’t handle it at all for SymIntList). It was difficult to diagnose because we had uninitialized memory that would unpredictably fail asserts after using the default. @anjali411 has a follow up to make it explicitly error for any unrecognized argument types so that this doesn’t occur in the future.

Merge to master retrospective

We had very few reverts this week (hooray!)

What’s new on the branch this week?

What’s made it to master this week?

What’s coming next?

  • E2E training on master with inductor.
  • All benchmark models are passing aot_eager training on branch; tracked at Operators that need symbolic support - Google Sheets
  • Fallback implementation for custom operators without symbolic shape propagation, inferred by running fallback on real operators
  • All OpInfo tests passing

We also had some discussions about “unbacked” symbolic integers (e.g., as produced by item()). @Lezcano has a proposal for how to organize decompositions Proposal for a property-based tag system for prims, refs, and decompositions - Google Docs

2 Likes