Does Pytorch offers any mean to convert Pytorch to triton for non-GPU devices?
It seems that Torchinductor
does this but only for device="CUDA"
. I am working on a costume non-GPU device. Is there a way to hack my way into torchInductor
to generate Triton kernels for device="cpu"
?
Do you want to run triton on a CPU device? This seems impossible, or a bit complicated. You can set the environment variable TORCH_COMPILE_DEBUG=1 and then use torch.compile(m, backend=“inductor”) to compile the python code, and then you will find a triton code file in a certain directory. https://pytorch.org/tutorials/intermediate/inductor_debug_cpu.html Finally, Triton-IR will use Triton compiler to generate LLVM IR, and then use LibLLVM to generate PTX code to run on the GPU. I think if you want to run on cpu, you can generate cpu binaries for LLVM IR through LLVM lib. I have not checked the llvm ir generated by triton. This is an approximate guess.
There isn’t a CPU backend for Triton yet, so TorchInductor generates C++.
@bertmaher is working on a CPU backend for Triton though.
Yes, non-GPU backends can be added to triton as well. That’s why I am wondering on how torch–>triton conversion could work for CPU.
Actually, I want to generate triton kernels out of pytroch model when there is no cuda device, i.e. getting triton code out of torch.compile(m, backend=“inductor”)
when device="cpu"
. And then, to be able to get LLVM IR from these triton kernels using a custom non-GPU triton backend.