"Fused compiled autograd bwd + optimizer graph" - status update?

Hi everyone!

I’ve recently read the 2024 H2 Roadmap documents, and [KR 6.4] in the Compiler Core document (see topic title) was of a particular interest to me. We’re currently trying to solve this on our own, and since the end of the year is close, I was wondering what the status of this feature was. Is there an issue or pull request that’s regularly updated, or is the feature currently on hold? Many thanks!

@xmfan can you provide an update here?

The feature is on hold, but it is also mostly about compatibility with torch.optim APIs.

If you’re using a custom optimizer implementation, there’s already a way to achieve capture both backward and optimizer logic into a single graph for fusion via backward hooks: Fused backward + Simple optimizer implementation · GitHub.

Thank you for a prompt response and the code snippet, seems fairly easy! I do wonder, is it possible to capture the full graph too? For example, modifying the decorator in your example with fullgraph=True breaks as expected when it reaches the loss.backward() line. So, what I’d like to achieve is the complete training graph…

For single training graph capture, we have an issue and WIP PR open for it: [RFC] Dynamo Single Step Graph · Issue #117394 · pytorch/pytorch · GitHub, but it still requires some work.

I’m curious about your use case for it, is it just about larger fusions/cudagraphs? We’re deciding whether it is something we want to prioritize for the near future.