How to Access Triton Kernels from TorchInductor when running on CPU?

JoeLi12345 · August 8, 2024, 10:50pm

When running torch.compile for a PyTorch model and accessing the generated TorchInductor python code for graph with C++/Triton kernels as indicated here, there appears to be no way to access the generated Triton code when using a CPU-device.

It appears as though all of the kernels are written in C++ when running on CPU; however, when running on CUDA, the kernels are written in Triton instead. Does PyTorch / TorchInductor offer a way to access the generated Triton kernels when running on a CPU device?

I can take care of running Triton on a CPU backend, but I need to be able to access the Triton kernels that are generated from PyTorch first.

jgong5 · August 12, 2024, 1:39am

Hi @JoeLi12345 Thanks for the question. Triton codegen backend for CPU has not been enabled for Inductor yet. Only C++ codegen is available for now.

Topic		Replies	Views
Pytorch to Triton for Non-GPU Devices compiler	7	1779	August 30, 2024
TorchInductor: a PyTorch-native Compiler with Define-by-Run IR and Symbolic Shapes compiler	46	73342	July 29, 2024
No CPU backend in triton FX	4	933	January 20, 2025
Trying to understand flow for compilation compiler	1	401	March 7, 2024
The future of C++ model deployment	7	3054	December 28, 2023

How to Access Triton Kernels from TorchInductor when running on CPU?

Related topics