Does Pytorch offers any mean to convert Pytorch to triton for non-GPU devices?
It seems that Torchinductor
does this but only for device="CUDA"
. I am working on a costume non-GPU device. Is there a way to hack my way into torchInductor
to generate Triton kernels for device="cpu"
?
Do you want to run triton on a CPU device? This seems impossible, or a bit complicated. You can set the environment variable TORCH_COMPILE_DEBUG=1 and then use torch.compile(m, backend=âinductorâ) to compile the python code, and then you will find a triton code file in a certain directory. https://pytorch.org/tutorials/intermediate/inductor_debug_cpu.html Finally, Triton-IR will use Triton compiler to generate LLVM IR, and then use LibLLVM to generate PTX code to run on the GPU. I think if you want to run on cpu, you can generate cpu binaries for LLVM IR through LLVM lib. I have not checked the llvm ir generated by triton. This is an approximate guess.
There isnât a CPU backend for Triton yet, so TorchInductor generates C++.
@bertmaher is working on a CPU backend for Triton though.
Yes, non-GPU backends can be added to triton as well. Thatâs why I am wondering on how torchâ>triton conversion could work for CPU.
Actually, I want to generate triton kernels out of pytroch model when there is no cuda device, i.e. getting triton code out of torch.compile(m, backend=âinductorâ)
when device="cpu"
. And then, to be able to get LLVM IR from these triton kernels using a custom non-GPU triton backend.
Were you ever able to figure out how to access the generated TorchInductor Triton kernels when running PyTorch for device=âcpuâ?
Yes, but I had to manually modify the torchInductor code to make it generate triton kernels for cpu.