Graphs when moving tensors between devices

Johannes99 · November 14, 2025, 4:30pm

Hello everyone!
Since I have a little experience with PyTorch’s backend now, I am wondering about a quirk I have seen:
As far as I know, tensors lose their graph when we move them between devices, due to the different nature of devices and operations not being traceable across devices. This seems fair to me.

However, in the CUDA linalg backend, there are plenty of instances where Tensors are moved between CPU and CUDA (lots of Magma paths seem to do this, for example).

This seems to violate the principle of a general problem of copying between devices. There seem to be some instances where moving between devices is fine. Does someone know exactly what the limiting factors are? Maybe graphs really only break when operations on the new device are performed on the new device?
I would find this highly interesting, as VRAM usage seems to be one of THE pressing issues in AI today (and, coincidentally, my research). IMO it would be great to allow some kind of caching to the CPU RAM to be able to move unused Tensors from VRAM to sysram in case this is at all possible.
Maybe I am also just plain wrong with this, and it is completely impossible. In this case, I would be very happy to learn the reason why that is the case.

Best
Johannes

Topic		Replies	Views
CUDAGraphs in Pytorch 2.0 compiler	6	5784	November 20, 2024
Understanding CUDAGraph Trees compiler	6	1859	April 15, 2025
About "[FSDP] Don't move ignored params / buffers to device"	1	383	August 2, 2024
Multi-GPU management extension	4	771	June 1, 2023
Embrace tensor subclass as a Python device registration API hardware-backends	5	715	March 28, 2025

Graphs when moving tensors between devices

Related topics