How does torch.compile work with autograd?

Chillee · October 31, 2023, 5:10am

the example of sin in the slides seems to have no memory benefit I think

Well, the example in the slides was just about explaining what the joint graph looks like - not about what an optimized version looks like.

I’m confused by the word “partition”. By partition, we usually mean to partition the graph into disjoint subsets.

Haha, this is true. In this case, the min-cut “partitioner” is not really partitioning the graph into two disjoint subsets - the backwards pass will be recomputing significant parts of the forwards pass. I still think of it as a “partitioning” problem because we’re given a graph with signature joint(fw_inputs, bw_inputs) => (fw_outputs, bw_outputs), and we need to return two graphs forward(fw_inputs) => (fw_inputs, activations) and backward(activations, bw_inputs) => bw_outputs.

I used to think aot autograd can discover something like optimized sigmoid (e.g. users write eager code z = 1 / (1 + torch.exp(-x)) , and we can figure out the smart backward as z * (1 - z) ). Now that I understand what aot autograd can do.

It’s possible we could make these kinds of decisions automatically in AOTAutograd (we have all the information), but I’m actually not totally sure this is even the right thing to do In this case, we would just recompute sigmoid forwards in the backwards pass, which I think will be just as efficient.

Topic		Replies	Views
`torch.compile` `AOTAutograd` backwards _inductor function compiler	0	519	January 23, 2024
`compile_autograd` compiler	0	385	January 26, 2024
TorchDynamo Update 6: Training support with AOTAutograd compiler	0	5811	March 29, 2022
Custom Ops Under torch.compile: autograd.Function vs torch.library.custom_op frontend API	4	226	May 18, 2026
Torch.compile with AOT Autograd can be debugged now! compiler	1	910	October 31, 2023

How does torch.compile work with autograd?

Related topics