The future of C++ model deployment

I wasn’t sure the best place to ask this. Currently we rely heavily on torchscript as a mechanism for defining models in Python and then compiling them into a program that can be executed into C++. The PyTorch ecosystem appears to be moving away from torchscript and towards torchdynamo based tracing, which gives us some nice performance benefits, but does not produce an artefact that can be executed in C++ (e.g. supports arbitrary Python operations with graph breaks, the Triton kernels from torchinductor require a Python runtime).

We are currently developing a mechanism for compiling Triton kernels ahead of time + codegen for creating torchscript extensions to execute them, but I just wanted to know if something similar or alternative was on the PyTorch roadmap.

1 Like

We are working on a torch.export along with an AOTInductor (Inductor but can produce a .so blob of its targets) that will enable you to do C++ deployment. Active work is happening here, with @desertfire being the POC.

Great, thanks! @desertfire is there any WIP branch/issue that I can track?