The future of C++ model deployment

On the dashboard, the data is shown as aot_inductor now. I have given a talk on the AOTInductor in PTC’23. For more information, you can watch the recording at https://www.youtube.com/watch?v=w7d4oWzwZ0c . Also there is a tutorial at AOTInductor: Ahead-Of-Time Compilation for Torch.Export-ed Models — PyTorch main documentation. cc @david-macleod

In Inductor, Fallback ops mean those not lowered by Inductor and thus we call eager implementation of those ops directly, which is relatively easy to do in the Python world but needs extra work if we are generating cpp. For the M1 backend, we haven’t tested it, but I won’t be surprised if any extra work is needed. Please give AOTInductor a try and let me know if there is anything I need to follow up.