CUDA Kernel with customised device

is there way to launch cuda kernels with the customised device, useprivate1… or to customise cuda and create new device on top of it, by o verloading some operations

What devices? Generally there is a cuda alternative called OpenCL - and it supports various GPUs - currently it works as out of tree backend. It supports significant amount of torch operators but far from being full featured: