State of symbolic shapes branch

ezyang · October 22, 2022, 10:20pm

State of symbolic shapes branch: Oct 22 edition

The symbolic-shapes branch (PyTorch: Symbolic shapes by ezyang · Pull Request #84246 · pytorch/pytorch · GitHub ) is a long running branch containing a large number of features and bugfixes related to dynamic shapes support in PyTorch. Previous update: State of symbolic shapes branch - #7 by ezyang

Commit ID at time of writing: 5f11aa560bdb406e6826e355edf19bda6174d63f

Executive summary

We focused on model enablement this week. We’re starting to hit the last mile on training, which means our pace is slowing as we spend time fixing more complicated bugs, though the pace on the branch is still faster than the rate we are merging code to master.

This major bug deserves a bullet on its own: we identified that torchdynamo was over-suppressing errors (even when dynamic shapes was off), artificially inflating the PASS rate of the dynamic shapes dashboard. This was fixed in [dynamo] Unify raise_on_* config to suppress_errors and raise by default by ezyang · Pull Request #87440 · pytorch/pytorch · GitHub . This update post will report training status with errors suppressed, but in subsequent update posts we will stop suppressing errors for a more accurate depiction of our status. (On the plus side; error suppression would not have affected inductor speedup numbers, so even if we were skipping blocks to compile, we are still getting good speedups.)
Model training status on symbolic-shapes. (run by @ezyang); see also Operators that need symbolic support - Google Sheets for hand audited results (not completely up to date atm, but actively used for task assignment). This run was done prior to fixing error suppression, so they suppress errors.
- torchbench: 44 out of 55 (+8 WoW)
- huggingface: 34 out of 44 (new)
- timm: 38 out of 62 (new)
Model inference status on master (run by @ezyang); these runs are all WITHOUT suppressing errors. Pass rate is artificially depressed as PyTorch the run was done with was built without numpy support.
- torchbench inductor: 27 out of 46 (+18 WoW)
- torchbench aot_eager: 35 out of 46 (+20 WoW)
- torchbench aot_eager, without dynamic shapes (baseline): 39 out of 46 (new)
OpInfo tests on symbolic-shapes (@ezyang)
- 339 passed (+34 WoW), 381 failed (+114 WoW), 496 skipped (+461 WoW) (note: fail/skip increase stems from increased inplace testing from @albanD)
- ~~208 passed (-1 WoW), 275 failed (+4 WoW), 125 skipped (-2 WoW).~~ (Logs at gist:4eb8f7e6a861fb35096fad1cdf3c2b3b · GitHub ; at least a partial fix should be Fix meta for meta_fill_ and Use functionalize in symbolic exhaustive tests, not sound otherwise). After fix: 219 passed (+10 WoW), 264 failed (-7 WoW), 125 skipped (-2 WoW)

Previous branch diff: 30 files changed, 1209 insertions(+), 225 deletions(-)
Current branch diff: 70 files changed, 1681 insertions(+), 430 deletions(-)

Notable bug fixes

[HACK] Don’t clone, instead don’t clobber proxies. This is a multi-week saga. The beginning of the story is that we noticed partitioning was failing, because occasionally size computation in the forward pass depended on size expressions on backward inputs. @wconstab identified that this was because we were overwriting proxies when a SymIntNode was reused on multiple tensors. To fix this, Will introduced a clone when we set SymInts into tensors (Clone symint on set_sizes_and_strides by wconstab · Pull Request #85878 · pytorch/pytorch · GitHub), so that when we assign proxies after a tensor returned by an operator, we would always have fresh SymIntNodes, preventing overwriting. However, after this fix landed, we noticed that occasionally SymIntNodes would show up that didn’t have any proxies at all! Brian fixed one particular instance of this (functionalization: skip meta reference compute for aot autograd), but there were others. We still don’t know how this situation arises (the minifier didn’t work), but @ezyang is testing an alternate fix on the branch, where instead of cloning SymInts, we simply avoid overwriting proxies if one already exists. This is a hack because it means we can’t faithfully report user programs (e.g., if you write r = torch.add(x, y); s = r.size(0), this might end up reporting as s = x.size(0)), but with the few days of testing it seems to have fixed the problem and not caused any other regressions on benchmark models.
Fixed FakeTensor not calling CompositeImplicitAutograd decomps sometimes. @anjali411 noticed that one of her models was stuck in an infinite loop involving _to_copy decomps. She and @ngimel unsuccessfully tried to debug it. @Chillee eventually pinned down the root cause by tracing through each decomposition the loop went through, and breaking the loop with a small change to fake tensor. This spawned a discussion about our use of decomposition tables being too complicated, which @SherlockNoMad has been working to refactor.
fix minifier and minor improvements to minifier. In many situations, the minifier would fail to minify errors involving dynamic shapes. These fixes make more programs minify successfully; many bugs fixed this week were fixed with help from the updated minifier.
Added some hacks to unblock hf_reformer. This bug manifested as another infinite loop in Sympy. @Chillee put in some hacks to fix it; I don’t really understand how it worked.
Support symbolic as_strided_, and make unsqueeze_ correct. The bug here is pretty simple: unsqueeze_ didn’t actually modify the tensor inplace. It was very difficult to diagnose without the minifier; after minification, it became clear that there was some unsqueeze_ shenanigans. @albanD helped by remembering that unsqueeze_ was implemented incorrectly on the branch. Going forward, I request people avoid committing known wrong logic to the branch.
Correct dtype calc when arange gets SymInt as argument. This was tricky to diagnose because arange passes fine (with an incorrect return dtype), there is a graph break, and then the program finally fails in another subgraph. The bug itself was also subtle, and we added a lint rule to catch future occurrences of it (Audit for error prone isinstance int/float and add lint)
properly return NotImplemented on binOp(symInt, tensor) calls. This bug was tricky to diagnose because understanding why it’s gone wrong requires knowing how reversible magic methods in Python work. First, Python attempts __mul__, and if it raises a TypeError or returns NotImplemented, it will silently swallow the error and try __rmul__. Previously this was accidentally working, but changes @bdhirsh made this no longer work.
Add support for torch.tensor(scalar_sym{int/float}). The short term fix involves just guarding on the contents of the SymInt, but @anjali411 and @ezyang discussed a more permanent solution, which involves introducing a new operator int_tensor(SymInt[] data) which can be used to directly propagate small amounts of tensor data through without guarding.
Bugfixes for metas. A lot of meta operations are incorrectly implemented because they use .to(memory_format=...), which doesn’t do what you think it does. An old issue `x.to(memory_format=torch.contiguous_format)` does not always return a contiguous tensor · Issue #62027 · pytorch/pytorch · GitHub has been revived and we are discussing what to do about this at the API level.
Improve argument printing. This is a dumb cosmetic problem that @ezyang eventually got fed up with and fixed. Sometimes, we would report an error like “Argument types did not match: expected tuple of ints but got tuple.” This is because argument parser did not consistently report what the inner type of a tuple/list was that caused the problem. This is now fixed.
Convert torch.Size() argument to sym size in test_proxy_tensor. Some bugs in factory functions were not caught because while we symintify input tensors, we don’t symintify inputs which could be tensors. @albanD worked to beef up our testing here, taking advantage of the fact that OpInfos explicitly denote Size arguments with the Size tuple subclass. However, we also need to still increase testing for int arguments that aren’t Size.
Add inplace function testing to test_proxy_tensor. We didn’t have any testing for inplace operators at all; in fact, OpInfo inplace testing is very poorly exercised. @albanD added more coverage here.
[discussion] fix for aot autograd outputs that dont require grad. This is a kind of embarrassing bug in AOTAutograd where we just didn’t think hard enough about what to do in various edge cases involving requires_grad. @soumith suggested that we should do a more careful audit of AOTAutograd for other edge cases along these lines.
symintify nll loss fns (#86915) by anjali411 · Pull Request #87095 · pytorch/pytorch · GitHub uncovered a bug in our default argument handling (specifically, we didn’t handle it at all for SymIntList). It was difficult to diagnose because we had uninitialized memory that would unpredictably fail asserts after using the default. @anjali411 has a follow up to make it explicitly error for any unrecognized argument types so that this doesn’t occur in the future.

Merge to master retrospective

We had very few reverts this week (hooray!)

Add meta support for _adaptive_avg_pool2d_backward was reverted because of a land race. We resolved to let the merge captain take care of PRs like this.

What’s new on the branch this week?

Make _embedding_bag_backward_symint call sym_numel ezyang
Symintify embedding_sparse_backward ezyang
Add support for symbolic shapes to sparse tensor ezyang
Symintify set_indices_and_values_unsafe ezyang
floor() support for sym floats, yolov3 passes bdhirsh
Preserve backtrace for KeyError ezyang
Use str to render SymInt directly, rather than groveling in expr ezyang
aten.upsample_nearest2d_backward.vec and aten._thnn_fused_lstm_cell.d… bdhirsh
[HACK] Don’t clone, instead don’t clobber proxies
Make Storage.nbytes support SymInt in Python ezyang
Minor improvement for storage size when its symbolic ezyang
Add meta for Scalar inplace overloads ezyang
Fix bug when making storage_offset symbolic ezyang
Improve gather dtype error message ezyang
fix minifier Chillee
Also add inplace Tensor metas; needed for moco ezyang
Add unary op neg, fix bugs in truediv and remove custom Ceil in favor… anjali411
Symintify group_norm ezyang
symintify input metadata mismatch check ezyang
minor improvements to minifier Chille
Added some hacks to unblock hf_reformer Chillee
Support symbolic as_strided_, and make unsqueeze_ correct + Add regression test for unsqueeze_ ezyang
Remove hacky and wrong arange ezyang
Correct dtype calc when arange gets SymInt as argument + Regression test for arange bug ezyang
Audit for error prone isinstance int/float and add lint ezyang
argmax and argmin meta registration ezyang
symintify grid_sampler + others bdhirsh
sym_float()/sym_int() instead of int/float() in decomps + fix sym_float/sym_int usage bdhirsh
sort symbolic meta bdhirsh
support mixed type sym arithmetic, add some missing ops, fix floor() bdhirsh
gridsample + upsample_bilinear2d backward meta bdhirsh
properly return NotImplemented on binOp(symInt, tensor) calls bdhirsh
scatter_ and scatter metas bdhirsh
symintify tensor.new() bdhirsh
Add support for torch.tensor(scalar_sym{int/float}) + fix for torch.tensor anjali411
Bugfixes for metas ezyang
Fix _thnn_fused_lstm_cell_meta bug and beef up error checking ezyang
Improve argument printing ezyang

What’s made it to master this week?

What’s coming next?

E2E training on master with inductor.
- Plumb fake tensors up to torchdynamo
- Plumb ShapeEnv guards up to torchdynamo guards (voz, this is in progress, check Sym Shapes, Control Flow in Dynamo - RFC + Plan of Record - Google Docs for design details)
- Resolve strategy for sharing ShapeEnv between forward and backwards (@ezyang’s take: don’t return SymInts from forward pass)
All benchmark models are passing aot_eager training on branch; tracked at Operators that need symbolic support - Google Sheets
Fallback implementation for custom operators without symbolic shape propagation, inferred by running fallback on real operators
All OpInfo tests passing

We also had some discussions about “unbacked” symbolic integers (e.g., as produced by item()). @Lezcano has a proposal for how to organize decompositions Proposal for a property-based tag system for prims, refs, and decompositions - Google Docs

Topic		Replies	Views
PyTorch 2.1: automatic dynamic shape compilation, torch.distributed.checkpoint, torch.compile + NumPy, torch.export prototype, and more! Release Announcements	0	1006	October 6, 2023
TorchInductor: a PyTorch-native Compiler with Define-by-Run IR and Symbolic Shapes compiler	46	66039	July 29, 2024
TorchDynamo: An Experiment in Dynamic Python Bytecode Transformation compiler	7	17150	March 9, 2023
A TorchDynamo trace time ablation study compiler	0	543	March 22, 2024
TorchInductor Update 6: CPU backend performance update and new features in PyTorch 2.1 compiler	0	1980	September 22, 2023