Also I understand that it is better to implement fused_adam since it is way more efficient, I have such function in dlprimitives but it would still require from me implementing several optimizes…
Does it means it is present and nightly and not older pytorch versions (since it is new PR)?
Yes, it is relatively new, but release 2.4 is out and this PR should be included there.
Do I need to do anything as backend developer to enable this fallback?
I do not believe so! This is registered on the CompositeExplicitAutograd key which is an alias key. This alias key includes the PrivateUse1 backend key and will be the implementation dispatched to unless another kernel is specifically registered for that key.