What are these `_foreach_` operators and why there is no fallback?

artyom-beilis · November 28, 2023, 9:45pm

I’m testing OpenCL backend with 2.1.1 and noticed many calls for _foreach_ functions that are missing like aten::_foreach_mul_.Scalar.

Now while I clearly understand utility of such operators for optimizers they add huge overload for backend developers.

Is there any way to automatically fall them back to standard non-foreach functions that already implemented?

artyom-beilis · November 28, 2023, 10:17pm

Also I understand that it is better to implement fused_adam since it is way more efficient, I have such function in dlprimitives but it would still require from me implementing several optimizes…

artyom-beilis · June 24, 2024, 8:11pm

Bump… anybody can say anything about it?

I understand they are mostly called from stuff like Adam but can’t they be implemented automatically in terms of normal kernels?

smth · June 25, 2024, 4:26pm

maybe @janeyx99 can answer

janeyx99 · June 25, 2024, 4:44pm

They can indeed be implemented as a for-loop over existing aten ops, and we do do that! However, before my PR Allow slow foreach to run for any backend, not just CPU by janeyx99 · Pull Request #127412 · pytorch/pytorch · GitHub, this implementation was only registered to the CPU backend key. Since this PR a month ago, we’ve registered the path for CompositeExplicitAutograd, which should function as a sufficient fallback kernel.

Let me know if you have further questions/whether this satisfies the ask!

artyom-beilis · June 25, 2024, 4:57pm

Yes I have

Does it means it is present and nightly and not older pytorch versions (since it is new PR)?
Do I need to do anything as backend developer to enable this fallback.

Thanks!

janeyx99 · June 25, 2024, 6:26pm

Does it means it is present and nightly and not older pytorch versions (since it is new PR)?

Yes, it is relatively new, but release 2.4 is out and this PR should be included there.

Do I need to do anything as backend developer to enable this fallback?

I do not believe so! This is registered on the CompositeExplicitAutograd key which is an alias key. This alias key includes the PrivateUse1 backend key and will be the implementation dispatched to unless another kernel is specifically registered for that key.

Topic		Replies	Views
OpenCL backend dev - questions/support hardware-backends	4	311	August 29, 2024
[RFC] Adding Triton Backend for Aten operators hardware-backends	0	524	November 4, 2024
Backend Fallbacks hardware-backends	1	3472	August 22, 2024
Reverse Fusion of Node Pairs in Scheduler compiler	0	174	June 14, 2024
PrimTorch: How backend/compiler writers interact with various IRs compiler	2	2236	April 11, 2023

What are these `_foreach_` operators and why there is no fallback?

Related topics