Aten::mul.out receives tensors from mixed devices

tom · October 16, 2021, 9:52am

The dispatcher will typically use the first tensor argument to decide where to send to.
You might find the discussion at https://github.com/pytorch/pytorch/blob/6831d8e/aten/src/ATen/native/README.md#device_guard useful.
If you scroll down on the above link, there is a discussion that ATen usually generates checks that all your tensors are on the same device. (https://github.com/pytorch/pytorch/blob/6831d8e379392da1340a28fdb3e7e1382176d1d4/aten/src/ATen/core/op_registration/adaption.h#L48 , calls inserted by the codegen in tools/) This would probably be the right thing to check in your custom dispatch targets, too.
As you note, multiplication with scalars is handled by a different overload of mul (there is a difference between how shape-() tensors are handled vs. shape (1,)).

In general, I’d probably try to match what PyTorch does with cuda, but I guess you would already be doing that.

Best regards

Thomas

Topic		Replies	Views
OpenCL backend dev - questions/support hardware-backends	4	548	August 29, 2024
Implementing OpenCL backend for pytorch hardware-backends	14	17071	March 1, 2024
Embrace tensor subclass as a Python device registration API hardware-backends	5	750	March 28, 2025
DTensor - Status, Design and Looking Forward distributed	3	3843	July 14, 2025
OpenCL Backend - Important Updates hardware-backends	18	8136	May 23, 2025