I have created dual tensors (based on dual numbers) as a data type for a research project I am working on. This is based on pytorch and creates an alternative to backpropagation in cases where the backpropagation fails. The implementation is vectorized and depends on the capabilities of pytorch for executing on a GPU.
I want to contribute this back to community. This can be considered as a new type or it can be introduced as a forward differentiation mechanism. This can be used to create complex differentiable activation functions or differentiate functions that have traditionally been considered as non differentiable such as BDF methods.
How do I contribute this? This is not on the current list as far as I know.
The way we usually go about these things is: They often start as their own standalone package in their own repo. If they then gain enough traction on their own, an RFC would be made and the addition to the main library would be discussed.
On a different note, I would be initially skeptical about being able to differentiate with it functions that have traditionally been considered “non differentiable”. What is the mathematical derivation behind this? A function is differentiable if certain limit exists. As such, a function either is or isn’t differentiable, there are no considerations to be done here. Then, in the context of convex analysis, there are weaker notions of differentials (subdifferentials and so on) but it’s tricky to extend these to non-convex functions whose codomain is R^n (or a general manifold).
Furthermore, the concept of dual numbers is a way to interpret what we often call “forward differentiation” or “jvp”. See for example page 8 in https://arxiv.org/pdf/2207.06114.pdf. PyTorch already implements these ideas, so I wonder how is your approach mathematically different to these.
Thank you for your feedback. The theory of dual numbers is indeed similar to forward propagation. However the theory is more general than simply the mechanics of pushing gradients forward in a neural network.
Let me give you an example. Suppose I wanted to calculate the Jacobian of an Lu decomposition (the sensitivity of the inputs wrt the outputs) how would I do that using forward mode ad? Further suppose I wanted to differentiate through nested optimization (Lu plus newtons method). These are methods that involve nested derivatives and the creation of a computational graph is inordinately expensive because of the unpredictable number of steps involved in finding the zero roots.
Dual numbers on the other hand do not construct a computational graph and have a constant memory footprint. If I wanted to differentiate through an optimization method involving newtons method with line search and Lu decomposition that occurred say over a million iterations well I don’t think there is a gpu in existence that has enough memory to do that using constructs like a computational graph.
Unless you can assure me that forward mode ad as implemented in PyTorch has fixed memory footprint across these kinds of problems I would be happy to use it if you could point me in the right direction.
All these things are implemented in PyTorch core (and in any other library like JAX and Tensorflow), and have a solid mathematical and well-understood theory behind. I would very much recommend you to perform a more thorough literature review and, in particular, have a look at how PyTorch, JAX and/or Tensorflow implement forward AD and the forward AD formulas for these methods in particular.
I did my PhD in dual numbers so I think I know what I am talking about. The concept I implemented is several orders of magnitude faster that what has been implemented in pytorch or JAX because of an extension I made to the dual number theory as part of my PhD. The speed is comparable to back propagation.
Now it was my intent to release this via the proper channels and do the work to make it compliant with the standards and I don’t have time to debate this on a developers forum.
I will release it separately and if interested you can take it from there. I have better things to do.
If you have a formal write-up of your theory, I’d be more than happy to read it. In particular, it would be good to understand how are these framed in the usual mathematical theory. Even a pointer to the relevant part of your thesis or papers would be really helpful.
At any rate, I do think that the best way forward is to release this as a standalone library and, if it gets traction from the community, submit it as a feature request into core.