I wasn’t sure the best place to ask this. Currently we rely heavily on torchscript as a mechanism for defining models in Python and then compiling them into a program that can be executed into C++. The PyTorch ecosystem appears to be moving away from torchscript and towards torchdynamo based tracing, which gives us some nice performance benefits, but does not produce an artefact that can be executed in C++ (e.g. supports arbitrary Python operations with graph breaks, the Triton kernels from torchinductor require a Python runtime).
We are currently developing a mechanism for compiling Triton kernels ahead of time + codegen for creating torchscript extensions to execute them, but I just wanted to know if something similar or alternative was on the PyTorch roadmap.