Min-cut optimal(*) recomputation (i.e. activation checkpointing) with AOTAutograd

Chillee · January 20, 2022, 12:52am

Oh, very cool!

I think this sounds very similar to what we’re doing I hadn’t seen an approach similar to this in any prior literature, but it’s a pretty intuitive approach so I’m not too surprised.

Reading closer, I think it might actually be morally identical.

Any other good ideas on ways to optimize autodiff?

Topic		Replies	Views
Compiling the optimizer with PT2 compiler	8	3712	January 29, 2024
How does torch.compile work with autograd?	13	3801	November 21, 2023
Performance Comparison between Torch.Compile and APEX optimizers compiler	1	2112	May 1, 2024
Keeping PyTorch's Ops Maintainable: The Jiterator hardware-backends	7	1822	February 27, 2023
Lazy Tensor Core hardware-backends	20	7655	July 12, 2022

Min-cut optimal(*) recomputation (i.e. activation checkpointing) with AOTAutograd

Related topics