The future of C++ model deployment

david-macleod · May 24, 2023, 10:17am

I wasn’t sure the best place to ask this. Currently we rely heavily on torchscript as a mechanism for defining models in Python and then compiling them into a program that can be executed into C++. The PyTorch ecosystem appears to be moving away from torchscript and towards torchdynamo based tracing, which gives us some nice performance benefits, but does not produce an artefact that can be executed in C++ (e.g. supports arbitrary Python operations with graph breaks, the Triton kernels from torchinductor require a Python runtime).

We are currently developing a mechanism for compiling Triton kernels ahead of time + codegen for creating torchscript extensions to execute them, but I just wanted to know if something similar or alternative was on the PyTorch roadmap.

smth · May 25, 2023, 2:25pm

We are working on a torch.export along with an AOTInductor (Inductor but can produce a .so blob of its targets) that will enable you to do C++ deployment. Active work is happening here, with @desertfire being the POC.

david-macleod · June 3, 2023, 8:03pm

Great, thanks! @desertfire is there any WIP branch/issue that I can track?

desertfire · June 14, 2023, 9:19pm

Most of the inductor codegen changes already live on the main branch. You can get some idea by looking at the history of pytorch/torch/_inductor/codegen/wrapper.py at main · pytorch/pytorch · GitHub. The runtime part which makes it interact with TorchScript is still being worked on and should be coming fairly soon.

Meanwhile, we have a dashboard to monitor the cpp_wrapper codegen for inference performance PyTorch CI HUD (select Mode as inference and look for inductor_cpp_wrapper). Although that still uses python, it is a good proxy on how robust and performant AOTInductor can be.

nitzan-shaked · September 5, 2023, 3:26pm

Following up on this
(as a side note, I can’t seem to find “inductor_cpp_wrapper” in the dashboard?)

but more specifically – as an exercise I am trying to generate a pure-c++ cpu-backed (on M1) .so from resnet. I fail because “ExternKernel”, and specifically – Convolution lowering creates an ir node which derives from ExternKernelAlloc. That one in particular has no cpp_kernel or cpp_codegen abilities.

I’m not 100% sure what “Extern” in this context means, and I’m wondering what it would take to add support for pure-cpu for this, and any other ExternKernel that I might need for resnet. How would I go about it, or what am I missing?

desertfire · November 13, 2023, 6:31pm

On the dashboard, the data is shown as aot_inductor now. I have given a talk on the AOTInductor in PTC’23. For more information, you can watch the recording at https://www.youtube.com/watch?v=w7d4oWzwZ0c . Also there is a tutorial at AOTInductor: Ahead-Of-Time Compilation for Torch.Export-ed Models — PyTorch main documentation. cc @david-macleod

In Inductor, Fallback ops mean those not lowered by Inductor and thus we call eager implementation of those ops directly, which is relatively easy to do in the Python world but needs extra work if we are generating cpp. For the M1 backend, we haven’t tested it, but I won’t be surprised if any extra work is needed. Please give AOTInductor a try and let me know if there is anything I need to follow up.

david-macleod · November 21, 2023, 6:17pm

Thanks for the info @desertfire! How do you see the relationship between AOTInductor and Torchscript? Specifically I am interested in know if both will be maintained / built out going forward as it seems there is some overlap in capabilities now that we have C++ runtime for inductor based optimizations.

We now have AOT Triton based codegen which instead of JIT compiled via NVFuser (as with torchscript), which has some nice benefits, but export/Inductor does not support control flow and the ability to load C++ compatible artefacts into Python PyTorch AFAIK (as with can we torchscript models).

Do you see AOTInductor and Torchscript as different solutions or something that will eventually converge? For example the C++ Inductor wrappers could be modified to be torchscript extensions.

desertfire · December 28, 2023, 10:18pm

Cross post What's the difference between torch.export / torchserve / executorch / aotinductor? - #7 by desertfire here, to partly answer your question on how an AOTInductor generated .so can work with Torchscript. If your use case is to have the generated .so work with PyTorch eager, I have recently added some pybind utils to help with that.

For the longer-term, we do view export+AOTInductor as a migration path for TorchScript. We understand control-flow support is something Torchscript users really like and we are working on supporting that in the export+AOTInductor path. Of course, there are still lots of other work to be done to make the new path truly mature enough to work as an out-of-box solution.

Topic		Replies	Views
What's the difference between torch.export / torchserve / executorch / aotinductor? deployment	17	2404	October 17, 2024
PyTorch 2.x Inference Recommendations deployment	11	1491	November 3, 2024
How to Access Triton Kernels from TorchInductor when running on CPU? compiler	1	764	August 12, 2024
Next Steps for PyTorch Compilers compiler	9	10610	October 21, 2021
What is the correct, future-proof, way of deploying a pytorch python model in C++ for inference? deployment	12	767	February 25, 2025

The future of C++ model deployment

Related topics