Graphs when moving tensors between devices

Hello everyone!
Since I have a little experience with PyTorch’s backend now, I am wondering about a quirk I have seen:
As far as I know, tensors lose their graph when we move them between devices, due to the different nature of devices and operations not being traceable across devices. This seems fair to me.

However, in the CUDA linalg backend, there are plenty of instances where Tensors are moved between CPU and CUDA (lots of Magma paths seem to do this, for example).

This seems to violate the principle of a general problem of copying between devices. There seem to be some instances where moving between devices is fine. Does someone know exactly what the limiting factors are? Maybe graphs really only break when operations on the new device are performed on the new device?
I would find this highly interesting, as VRAM usage seems to be one of THE pressing issues in AI today (and, coincidentally, my research). IMO it would be great to allow some kind of caching to the CPU RAM to be able to move unused Tensors from VRAM to sysram in case this is at all possible.
Maybe I am also just plain wrong with this, and it is completely impossible. In this case, I would be very happy to learn the reason why that is the case.

Best
Johannes