Inductor Triton Custom Op

If I register a custom op using the torch.library.Library api that calls a triton.jit kernel then compile a module containing this custom op with cpp_wrapper enabled, is the cubin of the triton kernel embedded in the generated CUDA extension?

How does this differ from a module with only (non-custom) aten ops that are compiled using inductor and lowered into triton kernels (through the inductor lowering pipeline) then output using the cpp_wrapper option?

1 Like

Hi there! We have implemented native support for triton kernels in torch.compile so you do not need to convert them to a custom op. In their current state, they will get compiled just like the other inductor emitted triton kernels. For the AOT Inductor (the C++ version), we will emit cubin file just like the other kernels.
We plan on supporting using this in custom ops but haven’t built that yet. @zou3519 for more information.


When is this version expected to be released?

We are targeting pytorch 2.3 for the official release. It should be working on the nightlies but we want to clean up all performance and composability problems before official release.

Thank you for your prompt response.I appreciate your time in providing the information.