Aten::mul.out receives tensors from mixed devices

I’m working on OpenCL backend for pytorch. When implementing operations needed for adam I got stuck with following case:


Tensor & mul_out(const Tensor & self, const Tensor & other, Tensor & out)
    std::cerr << "Self:" << self.device() << " " << self.numel() << std::endl;
    std::cerr << "Other:" << other.device() << " " << other.numel() << std::endl;
    std::cerr << "Out:" << out.device() << " " << out.numel() << std::endl;

I got correctly self and out as opencl devices but other was single element cpu device

Self:opencl:1 288
Other:cpu 1
Out:opencl:1 288

It was surprise for me since I expected that I will get tensors for my device/backend only. Why wasn’t it dispatched to mul_.Scalar or other op?

Do I need to handle cpu tensors being another parameter to a GPU tensor as something normal? How do I handle such situation? (Basides manually check that other is cpu tensor of size 1)

In general, I’d probably try to match what PyTorch does with cuda, but I guess you would already be doing that.

Best regards


I see. It is strange since I expected dispatch to happen with scaler.

In any case I added handling of cpu-scalar tensor case:

So for meanwhile I can continue.

At this point I have full working mnist training, and stock alexnet and vgg16 over opencl.

to be continued