Embrace tensor subclass as a Python device registration API

albanD · February 7, 2025, 2:55pm

Hey!

Thanks for taking the time to write down this proposal !

The experience of the above journey is overall pleasant, but it still has few issues:

This is ~expected as of today. In particular because we have been focusing the torch.compile/subclass usage for subclass that are “wrapper around other Tensor that eventually desugar into ops on plain Tensors”.
I think you can make sure your design fits this but it might be awkward? In particular, store all the data as another Tensor that is just a holder, and translates everything to your own custom_ops (as in PyTorch Custom Operators — PyTorch Tutorials 2.6.0+cu124 documentation).
Then compile will “desugar” into these ops and run them as black boxes.
IIRC the FakeTensor trick is pretty simple: pytorch/torch/_subclasses/fake_tensor.py at main · pytorch/pytorch · GitHub
I would agree with Ed on the issue that, if you can have the device be accurate based on the device you want to use. It is also a bit tricky and, while FakeTensor helped us clean up a lot of things, there is most likely a few rough edges left.

While we’re working with @janeyx99 to provide ABI-stability for a subset of libtorch, we are focusing on custom kernel writers right now, not out of tree device.
This work will help, if you go down the path of only python Tensor object + custom ops in c++.
But if you need to use the PrivateUse1 extension points, this will not be covered in the current plan.

If we define a custom C++ tensor.Tensor subclass described in 1. can that class live
in the upstream so there’s no torch header dependency from a backend? This class
is generic enough for any backend to use. Maybe give it its own dispatch key so
we don’t need to overload privateuseone?

I’m not sure to understand what you mean here and would have a couple questions:

What blocks you today from doing all of this with the subclass?
The second concern for most backend writers once they have something that works is performance. I am not sure what will be the actual characteristics performance-wise of this approach and if the overhead will be acceptable.

Topic		Replies	Views
State of PyTorch core: September 2021 edition frontend API	1	9527	September 21, 2021
Memory operations on a custom backend hardware-backends	4	1387	July 5, 2022
Custom Ops Under torch.compile: autograd.Function vs torch.library.custom_op frontend API	2	121	April 28, 2026
Any simple example about the new way to register custom device through PrivateUse1 frontend API	5	1802	August 23, 2024
Possible to use custom backend to create TensorImpl that allows custom datatype? hardware-backends	2	513	April 11, 2025

Embrace tensor subclass as a Python device registration API

Related topics