A simplified introduction to PyTorch's autograd implementation

Cool!
So one thing I’ve been wondering about is whether we should make this type of experiment easier by allowing programmatic access to derivatives.
The “easy” part might be exposing what is in derivatives.yaml / what is generated from it.
The hard part is what to do with the functions in ATen that have “tape-based differentiation”.
This would also enable us to have multiple implementations auf autograd (autograd classic, autodiff, “hacker’s autograd” (ie in Python)) use the same per-operator derivatives.

Best regards

Thomas