State of PT2: Jan 28, 2024 edition
- Calibrations were this week.
- Joel, Jeffrey, Alban, Brian and Edward convened to discuss subclass view fakeification again. Subclass view fakeification occurs when we are given a tensor subclass which is a view of another tensor (the canonical example is a nested tensor which is a view of a dense tensor of all the packed data), and we need to convert it into a faithful fake representation so we can simulate operations on it in Dynamo and AOT Autograd. Construction of views in fake tensors is traditionally done by fakeifying the base tensor, and then reapplying a recorded view function which specifies how to “replay” the view on an arbitrary new base. The problem of subclass view fakeification is that these view functions typically hard code size / tensors that are free variables of the view operation, but when fakeifying, these need to be swapped out with symbolic integers and corresponding fake tensors. Joel’s resolution after the meeting was to reify view functions so that this information can be swapped. Notes: https://docs.google.com/document/d/1C5taWiplmX7nKiURXDOAZG2W5VNJ2iV0fQFq92H0Cxw/edit#heading=h.vl1gidtprtoo
- Shampoo compile time is still a problem. I talked to some folks on Wednesday who were like “our job is stuck in produce_guards” and it turned out to be the exact same tensors_definitely_do_not_overlap guard explosion that caused two other SEVs described at Meta issue: Automatic dynamic shapes can cause compile-time performance cliffs · Issue #118213 · pytorch/pytorch · GitHub. Brian is going to try to fix the tensors_definitely_do_not_overlap problem in a few weeks. I showed Yanbo how to navigate the logs to find the culprit (in this case, just searching for symbolic_shapes logs was enough to identify this as the same problem.) There is some difficulty reliably turning of automatic dynamic shapes (which would help with this problem) that needs to be studied in more detail.
- Two interesting new posts: Micro-optimizations for the most micro of benchmarks and [RFC] New Python operator registration API which I highly recommend
- I’ve been talking Elias and Mario through the plan to remove unsound 0/1 specialization Eliminate compile time ranges for a simpler analysis · Issue #117361 · pytorch/pytorch · GitHub and people are on board, I am going to implement this next week.
- A bit of chatter about what to do about backtraces in distributed interleaving each other. Chip Turner shared that if you pass appropriate arguments to torchrun like
torchrun --role mnist-trainer --log-dir /tmp/l -t 3 -r 3 mnist/main.py
good things happen. Unfortunately in internal prod we are still shoving everything to stderr but maybe we can change that. Meta only: Redirecting... - We’re using dmypy instead of mypy for typechecking now in lintrunner. Typechecking is a lot faster! If you think there’s some weird cache problem, you can say
dmypy stop
to restart the daemon. - A lot of dynamic shapes bugs in Inductor specifically this week: Inductor sizevars wrapper assignment DCE hazard · Issue #118385 · pytorch/pytorch · GitHub assert isinstance(value, CppCSEVariable) and value.is_vec · Issue #118379 · pytorch/pytorch · GitHub Inductor mixed device operations not handled correctly, maybe buffer reuse problem? · Issue #118299 · pytorch/pytorch · GitHub . I’ve personally been mucking around a bit in Inductor recently!
- Landed stuff:
- Realize inputs to DynamicScalar before unwrapping storage - another “oops we fed the wrong thing to an extern kernel” bug
- Landed from last week: Fix several bugs related to unbacked SymInt codegen in inductor, Rename unbacked SymInt prefix to u, Document OpsHandler protocol
- Notable new bugs: