What is the correct, future-proof, way of deploying a pytorch python model in C++ for inference?

raphael10-collab · February 7, 2025, 12:10pm

Hi!

I want to understand if it is possible, and how to do it, to deploy and use for inference in C++, pytorch models developed, and trained in python

From here : Pytorch 2 and the c++ interface - #6 by ezyang - C++ - PyTorch Forums
I understand that pytorch C++ API is going to be progressively abandoned (dead trail)

But… from https://pytorch.org I can download libtorch:
https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.6.0%2Bcpu.zip

What is the correct, future-proof, way of deploying a pytorch python model in C++ for inference?
Can you please point me to a working example of pytorch python model deployed in a simple C++ piece of code?

desertfire · February 7, 2025, 6:28pm

Please give AOTInductor a try, AOTInductor: Ahead-Of-Time Compilation for Torch.Export-ed Models — PyTorch 2.6 documentation

raphael10-collab · February 8, 2025, 12:04pm

Thank you for pointing me to AOTinductor.

I followed, step by step, the example, and got, in the compilation phase of the inference, this error :

(.aoti) (base) raphy@raohy:~/AOTInductor/example/build$ CMAKE_PREFIX_PATH=/path/to/python/install/site-packages/torch/share/cmake cmake ..
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 14.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:4 (find_package):
  By not providing "FindTorch.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "Torch", but
  CMake did not find one.

  Could not find a package configuration file provided by "Torch" with any of
  the following names:

    TorchConfig.cmake
    torch-config.cmake

  Add the installation prefix of "Torch" to CMAKE_PREFIX_PATH or set
  "Torch_DIR" to a directory containing one of the above files.  If "Torch"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Configuring incomplete, errors occurred!

This what I’ve done:

(base) raphy@raohy:~$ mkdir AOTInductor
(base) raphy@raohy:~$ cd AOTInductor/
(base) raphy@raohy:~/AOTInductor$ python -m venv .aoti
(base) raphy@raohy:~/AOTInductor$ source .aoti/bin/activate
(.aoti) (base) raphy@raohy:~/AOTInductor$

(.aoti) (base) raphy@raohy:~/AOTInductor$ pip3 install torch torchvision torchaudio --index-url  
https://download.pytorch.org/whl/cpu
Looking in indexes: https://download.pytorch.org/whl/cpu
Collecting torch


(.aoti) (base) raphy@raohy:~/AOTInductor/example$ python model.py
/usr/bin/ld: warning: /tmp/torchinductor_raphy/   
cx7jxbnff2tlwdz2gpv4yy5zoxvd7b6o2t5zekqulqe6zo5ld5vs/  
ctwashdztcg4lyazvnlkmavrejyhfhfrtcama5gexx73mlv3sp2u/
cdxfaagbu5nqhrxwdtuvuvihnixco5qjerruqr26ubzmganyzfeq.o: missing .note.GNU-stack section 
implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the 
linker

@desertfire

 (.aoti) (base) raphy@raohy:~/AOTInductor/example$ ls -lah
total 364K
drwxrwxr-x 2 raphy raphy 4,0K feb  8 16:34 .
drwxrwxr-x 4 raphy raphy 4,0K feb  8 16:29 ..
-rw-rw-r-- 1 raphy raphy  393 feb  8 16:34 CMakeLists.txt
-rw-rw-r-- 1 raphy raphy  937 feb  8 16:34 inference.cpp
-rw-rw-r-- 1 raphy raphy 342K feb  8 16:33 model.pt2
-rw-rw-r-- 1 raphy raphy 1,5K feb  8 16:32 model.py

How to make it work?

desertfire · February 11, 2025, 3:44pm

is only an example path. You need to find your corresponding local pytorch install path in order to make that work.

smth · February 11, 2025, 5:26pm

you can get this programatically via:

python -c "import os; import torch; print(os.path.join(os.path.dirname(torch.__file__), 'share/cmake'))"

raphael10-collab · February 13, 2025, 4:12pm

@desertfire The previous error was due to my silly mistake. Solved. But now I get a different kind of error:

(.aoti) (base) raphy@raohy:~/AOTInductor/example/build$ CMAKE_PREFIX_PATH=/home/raphy/AOTInductor/.aoti/lib/python3.12/site-packages/torch/share/cmake cmake ..
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 14.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Warning at /home/raphy/AOTInductor/.aoti/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /home/raphy/AOTInductor/.aoti/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:121 (append_torchlib_if_found)
  CMakeLists.txt:4 (find_package)


-- Found Torch: /home/raphy/AOTInductor/.aoti/lib/python3.12/site-packages/torch/lib/libtorch.so
-- Configuring done (0.5s)
-- Generating done (0.0s)
-- Build files have been written to: /home/raphy/AOTInductor/example/build



(.aoti) (base) raphy@raohy:~/AOTInductor/example/build$  cmake --build . --config Release
[ 50%] Building CXX object CMakeFiles/aoti_example.dir/inference.cpp.o
[100%] Linking CXX executable aoti_example
[100%] Built target aoti_example


(.aoti) (base) raphy@raohy:~/AOTInductor/example/build$ ls
aoti_example  CMakeCache.txt  CMakeFiles  cmake_install.cmake  Makefile


(.aoti) (base) raphy@raohy:~/AOTInductor/example/build$ ./aoti_example 
terminate called after throwing an instance of 'std::runtime_error'
  what():  Failed to initialize zip archive: file open failed
Aborted (core dumped)

Is the initial CMake Warning related ? :

CMake Warning at /home/raphy/AOTInductor/.aoti/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /home/raphy/AOTInductor/.aoti/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:121 (append_torchlib_if_found)
  CMakeLists.txt:4 (find_package)

I searched info about this warning and found this recent, unsolved github issue :

github.com/pytorch/pytorch

Warning static library kineto_LIBRARY-NOTFOUND not found when trying build tutorial torch.compiler_aot_inductor.html

opened 08:55PM - 11 Mar 24 UTC

atalman

module: binaries module: cpp oncall: releng triaged topic: binaries

### 🐛 Describe the bug I observe following error when try to follow this tuto…rial: https://pytorch.org/docs/main/torch.compiler_aot_inductor.html Here is the output log: ``` CMAKE_PREFIX_PATH=/home/atalman/miniconda3/envs/py310/lib/python3.10/site-packages/torch/share/cmake cmake .. -- The C compiler identification is GNU 9.4.0 -- The CXX compiler identification is GNU 9.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found CUDA: /usr/local/cuda (found version "12.1") -- The CUDA compiler identification is NVIDIA 12.1.105 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.1.105") -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE -- Caffe2: CUDA detected: 12.1 -- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc -- Caffe2: CUDA toolkit directory: /usr/local/cuda -- Caffe2: Header version is: 12.1 -- /usr/local/cuda/lib64/libnvrtc.so shorthash is b51b459d -- USE_CUDNN is set to 0. Compiling without cuDNN support -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support -- Autodetected CUDA architecture(s): 8.0 8.0 8.0 8.0 8.0 8.0 8.0 8.0 -- Added CUDA NVCC flags for: -gencode;arch=compute_80,code=sm_80 CMake Warning at /home/atalman/miniconda3/envs/py310/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message): static library kineto_LIBRARY-NOTFOUND not found. Call Stack (most recent call first): /home/atalman/miniconda3/envs/py310/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found) CMakeLists.txt:4 (find_package) -- Found Torch: /home/atalman/miniconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libtorch.so -- Configuring done -- Generating done -- Build files have been written to: /data/home/atalman/aot_ind/build ``` However looking at the Wheel build log here: https://github.com/pytorch/pytorch/actions/runs/8229185939/job/22499908747 I do see we include kineto in the build: ``` Configuring Kineto dependency: 2024-03-11T07:42:39.8183621Z -- KINETO_SOURCE_DIR = /pytorch/third_party/kineto/libkineto 2024-03-11T07:42:39.8184187Z -- KINETO_BUILD_TESTS = OFF 2024-03-11T07:42:39.8184596Z -- KINETO_LIBRARY_TYPE = static ``` Without the correct settings people won't be able to use profiling in CPP code. Our nightlies and release binaries should generate the correct settings. cc @seemethere @malfet @osalpekar @jbschlosser @chauhang ### Versions 2.3.0

How to make it work?

desertfire · February 13, 2025, 6:22pm

I don’t think that warning is relevant.

Did your Python model and C++ inference use the same backend, e.g. both CPU or both CUDA?

raphael10-collab · February 13, 2025, 6:32pm

both CPU . I don’t even have a working GPU on the PC I’m using

albanD · February 18, 2025, 9:27pm

If you’re using CPU-only, ExecuTorch which is the Edge focused runtime might be a good solution for you as well: Setting Up ExecuTorch — ExecuTorch 0.5 documentation
It is also a runtime for export-ed models that is more constrained (you don’t have access to full libtorch) but it has a much smaller footprint (~kB runtime).

raphael10-collab · February 19, 2025, 2:28pm

Hi!
I’ve been following the indications found here: Setting Up ExecuTorch — ExecuTorch 0.5 documentation

(executorch) raphy@raohy:~/executorch$ ./cmake-out/executor_runner --model_path ../example_files/add.pte
I 00:00:00.000294 executorch:executor_runner.cpp:82] Model file ../example_files/add.pte is loaded.
I 00:00:00.000308 executorch:executor_runner.cpp:91] Using method forward
I 00:00:00.000317 executorch:executor_runner.cpp:138] Setting up planned buffer 0, size 48.
I 00:00:00.000350 executorch:executor_runner.cpp:161] Method loaded.
I 00:00:00.000369 executorch:executor_runner.cpp:171] Inputs prepared.
I 00:00:00.000395 executorch:executor_runner.cpp:180] Model executed successfully.
I 00:00:00.000399 executorch:executor_runner.cpp:184] 1 outputs:
Output 0: tensor(sizes=[1], [2.])

But, now, I want to convert, and then, execute with ExecuteTorch, the following fine-tuned model : Fine-Tuning-BERT-for-Named-Entity-Recognition/BERTfineTunningFinal.ipynb at main · tozameerkhan/Fine-Tuning-BERT-for-Named-Entity-Recognition · GitHub .

I’ve already trained for fine-tuning the model, and saved the fine-tuned model as safetensors :

(.bftner) (base) raphy@raohy:~/BertFineTuningForNERPyTorch$ ls -lah
total 132K
drwxrwxr-x   7 raphy raphy 4,0K feb 19 14:49  .
drwxr-x--- 156 raphy raphy  12K feb 19 13:31  ..
-rw-rw-r--   1 raphy raphy 5,3K feb 18 11:50 '=0.26.0'
-rw-rw-r--   1 raphy raphy 8,3K feb 19 14:35  BERT-NER-ExportableToExecuteTorch.py
-rw-rw-r--   1 raphy raphy 8,2K feb 18 15:42  BERT-NER.py
drwxrwxr-x   6 raphy raphy 4,0K feb 18 11:41  .bftner
drwxrwxr-x   2 raphy raphy 4,0K feb 18 14:47  ner_model
drwxrwxr-x   5 raphy raphy 4,0K feb 18 17:14  results
drwxrwxr-x   2 raphy raphy 4,0K feb 18 14:47  tokenizer
(.bftner) (base) raphy@raohy:~/BertFineTuningForNERPyTorch$


(.bftner) (base) raphy@raohy:~/BertFineTuningForNERPyTorch$ ls -lah ./ner_model/
total 416M
drwxrwxr-x 2 raphy raphy 4,0K feb 18 14:47 .
drwxrwxr-x 7 raphy raphy 4,0K feb 19 15:24 ..
-rw-rw-r-- 1 raphy raphy  896 feb 18 18:47 config.json
-rw-rw-r-- 1 raphy raphy 416M feb 18 18:47 model.safetensors
(.bftner) (base) raphy@raohy:~/BertFineTuningForNERPyTorch$ ls -lah ./results/
total 20K
drwxrwxr-x 5 raphy raphy 4,0K feb 18 17:14 .
drwxrwxr-x 7 raphy raphy 4,0K feb 19 15:24 ..
drwxrwxr-x 2 raphy raphy 4,0K feb 18 17:14 checkpoint-1756
drwxrwxr-x 2 raphy raphy 4,0K feb 18 14:03 checkpoint-2634
drwxrwxr-x 2 raphy raphy 4,0K feb 18 14:47 checkpoint-3512
(.bftner) (base) raphy@raohy:~/BertFineTuningForNERPyTorch$ ls -lah ./tokenizer/
total 940K
drwxrwxr-x 2 raphy raphy 4,0K feb 18 14:47 .
drwxrwxr-x 7 raphy raphy 4,0K feb 19 15:24 ..
-rw-rw-r-- 1 raphy raphy  125 feb 18 18:47 special_tokens_map.json
-rw-rw-r-- 1 raphy raphy 1,2K feb 18 18:47 tokenizer_config.json
-rw-rw-r-- 1 raphy raphy 695K feb 18 18:47 tokenizer.json
-rw-rw-r-- 1 raphy raphy 227K feb 18 18:47 vocab.txt

I’m confused and lost. How should I proceed now in order to be able to convert this fine-tuned model into a model executable by ExecuTorch in a desktop environment?

larryliu0820 · February 20, 2025, 6:23pm

Hi @raphael10-collab thanks for giving ExecuTorch a try. The workflow is as follows:

Load your .safetensor into your BERT model (torch.nn.Module)

from safetensors.torch import load_model

load_model(model, "model.safetensors")
# Instead of model.load_state_dict(load_file("model.safetensors"))

Use torch.export() to export the torch.nn.Module into a ExportedProgram. You need to prepare the example input and dynamic shape information. See this wiki for instructions. Some code examples:

import torch

ep = torch.export.export(model, example_args)

Then you would need to use ExecuTorch APIs to export the ExportedProgram into .pte files. Please find the instruction here. You can look at the example in export_hf_util.py (may not directly apply to your case), and your code will look similar to this:

from executorch.exir import to_edge
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner

program = to_edge(ep).to_backend(XnnpackPartitioner()).to_executorch() 
filename = "model.pte"
with open(filename, "wb") as f:
      program.write_to_file(f)

Once you have the model.pte you can run it using ./cmake-out/executor_runner:

./cmake-out/executor_runner --model_path model.pte

Hope this helps. You can also create issues in GitHub · Where software is built or join our discord channel!

raphael10-collab · February 25, 2025, 9:02am

Thank you Larry. I’ve asked help in the discord channel: Discord , because I don’t understand which dynamic shapes information should I use when exporting the torch.nn.Module into ExprtedProgram, since the model.safetensors I produced is just a fine-tuning of the Bert Model. Should I use the shapes from here: bert/modeling.py at master · google-research/bert · GitHub : input_ids: int32 Tensor of shape [batch_size, seq_length] containing word ids ? (edited)

raphael10-collab · February 25, 2025, 11:12am

With this code :

import torch
from torch.export import export


# https://pytorch.org/docs/stable/export.html
class Mod(torch.nn.Module):
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        ner_results = nlp(text)
        return ner_results


seq_length = len(text)
batch_size = 1
example_args= (torch.rand(batch_size, seq_length))

#exported_program: torch.export.ExportedProgram = export(
    #Mod(), args=example_args
#)

ep = torch.export.export(Mod(), example_args)

print(exported_program)

I get this error :

import torch
from torch.export import export


# https://pytorch.org/docs/stable/export.html
class Mod(torch.nn.Module):
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        ner_results = nlp(text)
        return ner_results


seq_length = len(text)
batch_size = 1
example_args= (torch.rand(batch_size, seq_length))

#exported_program: torch.export.ExportedProgram = export(
    #Mod(), args=example_args
#)

ep = torch.export.export(Mod(), example_args)

print(exported_program)






    ep = torch.export.export(Mod(), example_args)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raphy/BertFineTuningForNERPyTorch/.bftner/lib/python3.12/site-packages/torch/export/__init__.py", line 368, in export
    return _export(
           ^^^^^^^^
  File "/home/raphy/BertFineTuningForNERPyTorch/.bftner/lib/python3.12/site-packages/torch/export/_trace.py", line 1035, in wrapper
    raise e
  File "/home/raphy/BertFineTuningForNERPyTorch/.bftner/lib/python3.12/site-packages/torch/export/_trace.py", line 1008, in wrapper
    ep = fn(*args, **kwargs)
         ^^^^^^^^^^^^^^^^^^^
  File "/home/raphy/BertFineTuningForNERPyTorch/.bftner/lib/python3.12/site-packages/torch/export/exported_program.py", line 128, in wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/raphy/BertFineTuningForNERPyTorch/.bftner/lib/python3.12/site-packages/torch/export/_trace.py", line 1970, in _export
    return _export_for_training(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/raphy/BertFineTuningForNERPyTorch/.bftner/lib/python3.12/site-packages/torch/export/_trace.py", line 1035, in wrapper
    raise e
  File "/home/raphy/BertFineTuningForNERPyTorch/.bftner/lib/python3.12/site-packages/torch/export/_trace.py", line 1008, in wrapper
    ep = fn(*args, **kwargs)
         ^^^^^^^^^^^^^^^^^^^
  File "/home/raphy/BertFineTuningForNERPyTorch/.bftner/lib/python3.12/site-packages/torch/export/exported_program.py", line 128, in wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/raphy/BertFineTuningForNERPyTorch/.bftner/lib/python3.12/site-packages/torch/export/_trace.py", line 1821, in _export_for_training
    ) = _process_export_inputs(mod, args, kwargs, dynamic_shapes)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raphy/BertFineTuningForNERPyTorch/.bftner/lib/python3.12/site-packages/torch/export/_trace.py", line 1075, in _process_export_inputs
    raise UserError(
torch._dynamo.exc.UserError: Expecting `args` to be a tuple of example positional inputs, got <class 'torch.Tensor'>

Topic		Replies	Views
The future of C++ model deployment	7	2760	December 28, 2023
PyTorch 2.x Inference Recommendations deployment	11	1244	November 3, 2024
What’s preventing PyTorch from being competitive with Llamafile? compiler	8	422	December 10, 2024
TorchDynamo Update 6: Training support with AOTAutograd compiler	0	5616	March 29, 2022
Python Operator Authoring w/ NNC nnc	5	2464	June 7, 2022

What is the correct, future-proof, way of deploying a pytorch python model in C++ for inference?

Related topics