Forward AD: 2021 Retrospective

Hi everyone! Here’s an update on forward AD, a new major feature in autograd first shared at the beginning of 2021 (RFC) providing significant speedup and memory improvements in some use cases. In our last update forward AD has been able to compute Jacobian-vector product with a number of ops for real and complex dtype. In this update we share what we’ve done since, what we plan to do next - with Alban Desmaison, Anjali Chourdia, Ivan Yashchuk, Mario Lezcano, Matthias Reis aka mattarroz, Nik Vedeneev, Richard Zou

Things we’ve done:

  • Added many more formulas and improved existing ones (with Nik Vedeneev, Mario Lezcano, Alban Desmaison, Ivan Yashchuk, Richard Zou, Matthias Reis). According to OpInfos we’ve completed 346 of 436.
    • Writing forward AD formulas has also been made significantly easier by the following work, which I’d like to recognize:
      • OpInfos significantly simplified testing (Mike Ruberry, Saketh Are, QS, and many more)
      • Work on convolution consolidation, which reduced the number of formulas I had to write for convolution from 32 to only 2. (Joel Schlosser)
  • Forward AD custom functions have been added (Alban Desimaison)
  • Integrated forward AD into functorch as jvp (with Richard Zou, Alban Desmaison)
  • Performance improvements with ZeroTensors issues/69687 ~30% runtime reduction for jvp computation of operators when not all arguments are dual tensors (Anjali Chourdia, Alban Desmaison)
  • Tutorial has been added Forward-mode Automatic Differentiation
    • Tutorial covers 1) basic usage with low-level API, 2) custom function support, and 3) usage with modules (and coming soon: functorch API usage, e.g. jvp)
  • Extended the autograd.functional API to allow the use for forward AD in Jacobian and Hessian computation

Things we still want to finish for 1.11:

  • More formulas (contributions welcome! issues/71117)
  • Small UX improvements for custom functions
  • Make documentation public

Things planned after 1.11:

  • Jacobian and Hessian computation when the mode of sparsity is known, e.g. diagonal, block-diagonal. We plan to add this to both the autograd.functional and functorch.

Things we don’t plan on working on for now:

  • Multi-level forward-mode AD: We don’t plan on supporting nesting dual levels at the moment as this use case is already possible in functorch.
  • Updating autograd.functional.jvp: this and the rest of the integration of forward-mode AD with autograd.functional is on hold as we wait for resolution on the convergence of this API with functorch. In the meantime, one can use jvp from functorch.

Thanks to everyone who contributed!

If there’s anything you’d like to see added, please file an issue or comment below.