Pytorch to Triton for Non-GPU Devices

fhossein-quic · April 17, 2024, 10:04pm

Does Pytorch offers any mean to convert Pytorch to triton for non-GPU devices?
It seems that Torchinductor does this but only for device="CUDA". I am working on a costume non-GPU device. Is there a way to hack my way into torchInductor to generate Triton kernels for device="cpu"?

TyFeng · April 21, 2024, 8:39pm

Do you want to run triton on a CPU device? This seems impossible, or a bit complicated. You can set the environment variable TORCH_COMPILE_DEBUG=1 and then use torch.compile(m, backend=“inductor”) to compile the python code, and then you will find a triton code file in a certain directory. https://pytorch.org/tutorials/intermediate/inductor_debug_cpu.html Finally, Triton-IR will use Triton compiler to generate LLVM IR, and then use LibLLVM to generate PTX code to run on the GPU. I think if you want to run on cpu, you can generate cpu binaries for LLVM IR through LLVM lib. I have not checked the llvm ir generated by triton. This is an approximate guess.

jansel · April 23, 2024, 6:20pm

There isn’t a CPU backend for Triton yet, so TorchInductor generates C++.

@bertmaher is working on a CPU backend for Triton though.

fhossein-quic · April 24, 2024, 3:16pm

Yes, non-GPU backends can be added to triton as well. That’s why I am wondering on how torch–>triton conversion could work for CPU.

fhossein-quic · April 24, 2024, 3:31pm

Actually, I want to generate triton kernels out of pytroch model when there is no cuda device, i.e. getting triton code out of torch.compile(m, backend=“inductor”) when device="cpu". And then, to be able to get LLVM IR from these triton kernels using a custom non-GPU triton backend.

JoeLi12345 · August 8, 2024, 11:14pm

Were you ever able to figure out how to access the generated TorchInductor Triton kernels when running PyTorch for device=“cpu”?

fhossein-quic · August 19, 2024, 4:52pm

Yes, but I had to manually modify the torchInductor code to make it generate triton kernels for cpu.

jansel · August 30, 2024, 5:58pm

github.com/pytorch/pytorch

Add Triton CPU as an Inductor backend

pytorch:gh/int3/98/base ← pytorch:gh/int3/98/head

opened 05:15AM - 14 Aug 24 UTC

int3

+406 -231

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __…->__ #133408 The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @rec

Topic		Replies	Views
How to Access Triton Kernels from TorchInductor when running on CPU? compiler	1	532	August 12, 2024
No CPU backend in triton FX	4	513	January 20, 2025
[tac] Follow up: Inductor HW backend implementation hardware-backends	7	791	November 16, 2024
Trying to understand flow for compilation compiler	1	324	March 7, 2024
Inductor Triton Custom Op compiler	6	1319	March 25, 2025

Pytorch to Triton for Non-GPU Devices

Related topics