Min-cut optimal(*) recomputation (i.e. activation checkpointing) with AOTAutograd

Oh, very cool!

I think this sounds very similar to what we’re doing :sweat_smile: I hadn’t seen an approach similar to this in any prior literature, but it’s a pretty intuitive approach so I’m not too surprised.

Reading closer, I think it might actually be morally identical.

Any other good ideas on ways to optimize autodiff? :stuck_out_tongue: