Custom device using TensorIterator

Hi, team. TensorIterator is a great helper class for a lot ops, so we want using TensorIterator in custom device side.
In custom device side, we register TensorIteratorBase op just like CUDA side, which is to generate a op structured inherit from TensorIteratorBase. Like: structured_remainder_out_functional → at::native::structured_remainder_out → at::meta::structured_remainder_Tensor → TensorIteratorBase.
But there are a little differents between custom device and GPU. Custom device is not well support tensor stride and dynamic types. So we need to modify something in operand stored in TensorIteratorBase and get some configs in TensorIteratorConfig. Now both of them can’t be obtained out of TensorIteratorBase scope.
So can we add some public APIs to get operands and put TensorIteratorConfig as a const reference parameter in TensorIteratorBase.
DataFlow between GPU and custom device:
GPU:
op.meta() function call TensorIteratorBase build function and get a TensorIteratorBase instance;
op.impl() function call op kernel will TensorIteratorBase and tensor inputs and outpus.
Custom device:
op.meta() function call TensorIteratorBase build function and get a TensorIteratorBase instance;
do_something(op): call local TensorIteratorBridge by using TensorIteratorBase instance as input. In TensorIteratorBridge, we modify some operands based on out device attribute.
op.impl() call op kernel function and do something to keep same with pytorch.

1 Like