What's the difference between torch.export / torchserve / executorch / aotinductor?

youkaichao · November 7, 2023, 8:55am

I’m watching and learning from pytorch conference 2023, when I encountered these concepts. It seems they are all related to deploy models.

To the best of my understanding:

torch.export is a functionality in pytorch, to export a computation graph to some pre-defined format. This corresponds to the function torch.export.
torchserve is a framework to serve pytorch models at scale. It might use models from torch.export. This corresponds to a repo GitHub - pytorch/serve: Serve, optimize and scale PyTorch models in production .
executorch is another framework to run pytorch models for mobile/edge devices. It might use models from torch.export. This corresponds to a repo GitHub - pytorch/executorch: End-to-end solution for enabling on-device AI across mobile and edge devices for PyTorch models .
AOTInductor, from the talk https://www.youtube.com/watch?v=w7d4oWzwZ0c and slides , it seems to be an ahead-of-time version of inductor.

The most confusing thing to me is AOTInductor. It seems to be more of a concept. I don’t know if it is a new backend, or a new repo, or a new functionality.

Can anyone help me out? It would be great to provide some concrete code to show what is AOTInductor.

And any other help for understanding torch.export/torchserve/executorch would also be great!

youkaichao · November 7, 2023, 8:58am

Might be relevant to @desertfire .

gmagogsfm · November 13, 2023, 5:25pm

Hi Youkai, your understanding is pretty much all correct, only a few minor corrections.

torch.export is not a function, instead it is a Python module containing torch.export.export (I know …) and other utilities that export needs.
executorch always needs model from torch.export instead of “might use”
I am a bit behind on updates on AoTInductor. It is a prototype now and it is a new functionality to run TorchInductor in Ahead-of-time fashion. I don’t think it will be a new repo. @desertfire will have the definitive answer here.

desertfire · November 13, 2023, 6:20pm

Glad to see you are interested in AOTInductor. It is in a prototype stage and is being actively developed. AOTInductor: Ahead-Of-Time Compilation for Torch.Export-ed Models — PyTorch main documentation contains an example on how to use it. Please give it a try and let me know if you have any questions.

youkaichao · November 14, 2023, 1:16am

Thanks, that’s clear! So AOTInductor’s entry point is torch._export.aot_compile, and it is part of the export system to export the model as a standalone .so file.

jeromeku · November 29, 2023, 8:30pm

Thanks for the great talk at PyTorch Conf!

Quick question on the interplay between torch.script and AOTInductor – can you explain what the two tests here are testing? I.e., the difference between an AOT-compiled module that is scripted vs. one that is not, benefits / use-cases?

desertfire · December 28, 2023, 10:01pm

Yes, the test you pointed contains cases where an AOTInductor compiled .so file can be wrapped with torch::CustomClassHolder and thus works with existing TorchScript inference solution. It is a showcase for users who are currently leveraging TorchScript for their inference. If your inference system was not built on top of TorchScript, you can use plain C++ deployment with AOTInductor generated .so file.

spiegelball · February 17, 2024, 2:04pm

Very interesting to follow this! Do I understand it right that the torch::jit::script::Module will be used in the future as class to hold loaded .so files exported by AOT Inductor (even if the name is a bit misleading since no “scripting” is involved anymore)? And will we be able to move the loaded .so files between devices as we were able with the old torchscript workflow?

desertfire · February 21, 2024, 9:52pm

Calling the .so file from TorchScript is just one way to use AOTI. It is doable as shown in the test, but given that TorchScript is in the maintenance mode, we will not invest in that path.

To your second question, not sure if I understand you correctly, but you can specify a device str, e.g. “cuda:2”, when creating a runner, like here, pytorch/torch/csrc/inductor/aoti_runner/model_container_runner.cpp at 306642b66d9993fdd4ac736af285aa688174b6ce · pytorch/pytorch · GitHub

spiegelball · February 21, 2024, 11:32pm

Thanks for clarification! As I understand now there will be three ways to bring pytorch models to c++:

AOTI export .so file and load via runner (the same .so file can be loaded onto different devices like cpu, cuda or mps if I understand your last post right, am I correct)?
Load .so file with torchscript (maintainance mode).
Capture graph with TorchDynamo and export to torchscript and load torchscript file from c++ as usual (see this comment: What is the recommend serialization format when considering the upcoming pt2?)

Will this work on Windows and on MacOS as well?

desertfire · February 27, 2024, 3:37pm

The .so file needs to be compiled for specific device. You won’t be able to run a .so for CUDA model on CPU.
It works in theory, but not tested/maintained. This approach still requires TorchDynamo/TorchExport to export a single graph, which is not always possible for every model. If your model does export to a single graph, I don’t see how this choice adds additional value comparing to 1 or 2.

spiegelball · February 27, 2024, 4:03pm

Thanks for your reply! The formatting probably removed the number for your last sentence which I guess refers to 3). I fixed it in my reply above.

The nice thing about torchscript is that we are able to export a single model file from any host and deploy it to any other host, across MacOS, Windows and Ubuntu. This is a very strong feature. Will there be a replacement for this in the AOTInductor world?

desertfire · February 28, 2024, 5:05pm

Will there be a replacement for this in the AOTInductor world?

No. AOTInductor aims to compile a model to native binary. It doesn’t have a runtime to take some kind of serialized model representation and run it on different devices. However, AOTInductor does support different backends (CPU, CUDA, etc.). I can imaging a higher-level wrapper code can be created to decide which model.so to call into.

spiegelball · February 28, 2024, 5:43pm

I see! And do you know if there is a path towards a device-agnostic export workflow ahead or will there be a point in time when torchscript is finally deprecated and no replacement is available? We had the use case of a heterogenous server-fleet above but what about the case when we ship models to arbitrary users and they run them locally via libtorch? In this case it is impossible to compile all needed variants beforehand.

suo · February 28, 2024, 5:55pm

We will not deprecate TorchScript without a suitable (and technically superior) replacement.

I think the key missing piece, which we are developing but have not yet released, is a generic interpreted runtime that uses libtorch to execute the graph in a target-independent way, optionally calling out to compiled artifacts for acceleration.

So the proposed TorchScript replacement flow would be:

On the frontend:
torch.export → compile subgraphs/whole graph with inductor → packaged model (graph, plus any compiled artifacts)

On the server:
Runtime loads the packaged model and executes it, appropriately selecting interpretation/compiled artifacts depending on the host environment.

Does that picture fit with what you would expect?

spiegelball · February 29, 2024, 1:16pm

Hey @suo, this sound exactly like the solution needed. It would be nice making the compiled file optional, as in our case we ship some +5GB torchscript files to Windows, Mac and Ubuntu and serving them with compiled optimized versions would increase the file size 3 times, which won’t be worth the speedup gained.

What would be the best place to track ongoing development on the TorchScript successor and provide input to the developers, both bug reports and feature requests?

suo · March 5, 2024, 7:49pm

Awesome!

What would be the best place to track ongoing development on the TorchScript successor and provide input to the developers, both bug reports and feature requests?

We don’t have the runtime available at the moment, but you can give feedback on the frontend bits (torch.export, the ExportedProgram serialized format). I’ll update this thread when we have something more concrete to share on the runtime/backend side.

spiegelball · October 17, 2024, 7:57pm

@suo Are there any updates regarding this topic? Different new deployment technologies are emerging but I could not figure out yet how torchscript will get replaced.

Topic		Replies	Views
The future of C++ model deployment	7	2862	December 28, 2023
PyTorch 2.x Inference Recommendations deployment	11	1370	November 3, 2024
What’s preventing PyTorch from being competitive with Llamafile? compiler	8	493	December 10, 2024
What is the correct, future-proof, way of deploying a pytorch python model in C++ for inference? deployment	12	581	February 25, 2025
TorchInductor: a PyTorch-native Compiler with Define-by-Run IR and Symbolic Shapes compiler	46	68432	July 29, 2024

What's the difference between torch.export / torchserve / executorch / aotinductor?

Related topics