Pytorch to Triton for Non-GPU Devices

Does Pytorch offers any mean to convert Pytorch to triton for non-GPU devices?
It seems that Torchinductor does this but only for device="CUDA". I am working on a costume non-GPU device. Is there a way to hack my way into torchInductor to generate Triton kernels for device="cpu"?

1 Like

Do you want to run triton on a CPU device? This seems impossible, or a bit complicated. You can set the environment variable TORCH_COMPILE_DEBUG=1 and then use torch.compile(m, backend=“inductor”) to compile the python code, and then you will find a triton code file in a certain directory. https://pytorch.org/tutorials/intermediate/inductor_debug_cpu.html Finally, Triton-IR will use Triton compiler to generate LLVM IR, and then use LibLLVM to generate PTX code to run on the GPU. I think if you want to run on cpu, you can generate cpu binaries for LLVM IR through LLVM lib. I have not checked the llvm ir generated by triton. This is an approximate guess.

1 Like

There isn’t a CPU backend for Triton yet, so TorchInductor generates C++.

@bertmaher is working on a CPU backend for Triton though.

1 Like

Yes, non-GPU backends can be added to triton as well. That’s why I am wondering on how torch–>triton conversion could work for CPU.

Actually, I want to generate triton kernels out of pytroch model when there is no cuda device, i.e. getting triton code out of torch.compile(m, backend=“inductor”) when device="cpu". And then, to be able to get LLVM IR from these triton kernels using a custom non-GPU triton backend.

Were you ever able to figure out how to access the generated TorchInductor Triton kernels when running PyTorch for device=“cpu”?

Yes, but I had to manually modify the torchInductor code to make it generate triton kernels for cpu.