Memory operations on a custom backend


I a working on he integration for a new device in PyTorch.
I have been reading the (good) documentation on adding a new backend, so far things seems pretty simple: adding operator implementations with a custom dispatcher key and compiling it as a C++ extension.
However one piece of information is missing from my reading (sorry if I missed it from the doc), how do I make PyTorch able to handle the memory operations (allocation, memcpy, copy across different type of devices)?

I see a reference to VulkanOpaqueTensorImpl that could be helpful, however I struggle to see which bit I can modify to make the allocations happen on my device.

What I was expecting to do is to providing callback functions for memory management.

Can I get a little help with this? Maybe some reference will do.
Thank you!

You could take a look at add open device registration test with cpp extensions by bdhirsh · Pull Request #80477 · pytorch/pytorch · GitHub.

1 Like

Yep! The PR that Bairen linked shows a basic example of adding a custom device with its own allocator, and copying to/from cpu. Let me know if you have any questions!

That PR also shows a basic example of open registration (adding a new PyTorch device in c++, with no changes needed to PyTorch core). Better docs for it coming soon!

1 Like

Thanks that’s what I was looking for!
So it’s all about the Aten library.
I am new to Pytorch dev so I was missing that bit… I really appreciate the quality of the documentation, thanks to the team!

Would it be possible to get a summary of the steps needed/to consider for integrating a custom accelerator aside from the kernel registrations? Or maybe that PR already contains everything?