Inductor Triton Custom Op

jeromeku · December 6, 2023, 8:37am

If I register a custom op using the torch.library.Library api that calls a triton.jit kernel then compile a module containing this custom op with cpp_wrapper enabled, is the cubin of the triton kernel embedded in the generated CUDA extension?

How does this differ from a module with only (non-custom) aten ops that are compiled using inductor and lowered into triton kernels (through the inductor lowering pipeline) then output using the cpp_wrapper option?

oulgen · December 19, 2023, 5:21am

Hi there! We have implemented native support for triton kernels in torch.compile so you do not need to convert them to a custom op. In their current state, they will get compiled just like the other inductor emitted triton kernels. For the AOT Inductor (the C++ version), we will emit cubin file just like the other kernels.
We plan on supporting using this in custom ops but haven’t built that yet. @zou3519 for more information.

YiNANzhang · December 29, 2023, 9:21am

When is this version expected to be released?

oulgen · December 29, 2023, 5:46pm

We are targeting pytorch 2.3 for the official release. It should be working on the nightlies but we want to clean up all performance and composability problems before official release.

YiNANzhang · January 2, 2024, 12:32am

Thank you for your prompt response.I appreciate your time in providing the information.

jeromeku · March 25, 2025, 2:29am

@oulgen

In your tutorial on User-Defined Triton Ops, you mention that triton_wrap + triton_op are necessary when composing custom triton kernels within tensor subclasses.

Why is this, and are there any examples demonstrating this composability? Couldn’t find any meaningful examples (searched through torch/inductor tests and especially test_triton_kernels.

oulgen · March 25, 2025, 5:00am

@zou3519 do you have any examples for this?

Topic		Replies	Views
Custom cuda extension support in Inductor compiler	8	795	March 7, 2024
User-defined Kernels vs. `torch.library` custom op compiler	1	370	May 22, 2024
Custom C++ External Kernel for TorchInductor compiler	2	335	June 10, 2024
How to Access Triton Kernels from TorchInductor when running on CPU? compiler	1	735	August 12, 2024
Pytorch to Triton for Non-GPU Devices compiler	7	1519	August 30, 2024

Inductor Triton Custom Op

Related topics