Implementing OpenCL backend for pytorch

I started going over this tutorial: Extending dispatcher for a new backend in C++ — PyTorch Tutorials 1.9.1+cu102 documentation

I created custom function + backward for CPU so looks ok - I can link, call and run custom fwd/bwd on cpu tensor.

Now I want to extend for private use so I start prototyping opencl backend. What is not clear to me is following:

How do I create/copy to a tensor for privateuse dispatch key.

In python I can call torch.randn(10).to('cuda') now how do I do it for private key? What do I miss?