The future of C++ model deployment

desertfire · November 13, 2023, 6:31pm

On the dashboard, the data is shown as aot_inductor now. I have given a talk on the AOTInductor in PTC’23. For more information, you can watch the recording at https://www.youtube.com/watch?v=w7d4oWzwZ0c . Also there is a tutorial at AOTInductor: Ahead-Of-Time Compilation for Torch.Export-ed Models — PyTorch main documentation. cc @david-macleod

In Inductor, Fallback ops mean those not lowered by Inductor and thus we call eager implementation of those ops directly, which is relatively easy to do in the Python world but needs extra work if we are generating cpp. For the M1 backend, we haven’t tested it, but I won’t be surprised if any extra work is needed. Please give AOTInductor a try and let me know if there is anything I need to follow up.

Topic		Replies	Views
What is the correct, future-proof, way of deploying a pytorch python model in C++ for inference? deployment	12	286	February 25, 2025
TorchInductor Update 6: CPU backend performance update and new features in PyTorch 2.1 compiler	0	1928	September 22, 2023
How to Access Triton Kernels from TorchInductor when running on CPU? compiler	1	531	August 12, 2024
TorchInductor: a PyTorch-native Compiler with Define-by-Run IR and Symbolic Shapes compiler	46	63278	July 29, 2024
Inductor Triton Custom Op compiler	6	1318	March 25, 2025

The future of C++ model deployment

Related topics