The future of C++ model deployment

Most of the inductor codegen changes already live on the main branch. You can get some idea by looking at the history of pytorch/torch/_inductor/codegen/wrapper.py at main · pytorch/pytorch · GitHub. The runtime part which makes it interact with TorchScript is still being worked on and should be coming fairly soon.

Meanwhile, we have a dashboard to monitor the cpp_wrapper codegen for inference performance PyTorch CI HUD (select Mode as inference and look for inductor_cpp_wrapper). Although that still uses python, it is a good proxy on how robust and performant AOTInductor can be.