What is the correct, future-proof, way of deploying a pytorch python model in C++ for inference?

Hi!

I want to understand if it is possible, and how to do it, to deploy and use for inference in C++, pytorch models developed, and trained in python

From here : Pytorch 2 and the c++ interface - #6 by ezyang - C++ - PyTorch Forums
I understand that pytorch C++ API is going to be progressively abandoned (dead trail)

But… from https://pytorch.org I can download libtorch:
https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.6.0%2Bcpu.zip

What is the correct, future-proof, way of deploying a pytorch python model in C++ for inference?
Can you please point me to a working example of pytorch python model deployed in a simple C++ piece of code?

Please give AOTInductor a try, AOTInductor: Ahead-Of-Time Compilation for Torch.Export-ed Models — PyTorch 2.6 documentation

1 Like

Thank you for pointing me to AOTinductor.

I followed, step by step, the example, and got, in the compilation phase of the inference, this error :

(.aoti) (base) raphy@raohy:~/AOTInductor/example/build$ CMAKE_PREFIX_PATH=/path/to/python/install/site-packages/torch/share/cmake cmake ..
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 14.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:4 (find_package):
  By not providing "FindTorch.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "Torch", but
  CMake did not find one.

  Could not find a package configuration file provided by "Torch" with any of
  the following names:

    TorchConfig.cmake
    torch-config.cmake

  Add the installation prefix of "Torch" to CMAKE_PREFIX_PATH or set
  "Torch_DIR" to a directory containing one of the above files.  If "Torch"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Configuring incomplete, errors occurred!

This what I’ve done:

(base) raphy@raohy:~$ mkdir AOTInductor
(base) raphy@raohy:~$ cd AOTInductor/
(base) raphy@raohy:~/AOTInductor$ python -m venv .aoti
(base) raphy@raohy:~/AOTInductor$ source .aoti/bin/activate
(.aoti) (base) raphy@raohy:~/AOTInductor$

(.aoti) (base) raphy@raohy:~/AOTInductor$ pip3 install torch torchvision torchaudio --index-url  
https://download.pytorch.org/whl/cpu
Looking in indexes: https://download.pytorch.org/whl/cpu
Collecting torch


(.aoti) (base) raphy@raohy:~/AOTInductor/example$ python model.py
/usr/bin/ld: warning: /tmp/torchinductor_raphy/   
cx7jxbnff2tlwdz2gpv4yy5zoxvd7b6o2t5zekqulqe6zo5ld5vs/  
ctwashdztcg4lyazvnlkmavrejyhfhfrtcama5gexx73mlv3sp2u/
cdxfaagbu5nqhrxwdtuvuvihnixco5qjerruqr26ubzmganyzfeq.o: missing .note.GNU-stack section 
implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the 
linker

@desertfire

 (.aoti) (base) raphy@raohy:~/AOTInductor/example$ ls -lah
total 364K
drwxrwxr-x 2 raphy raphy 4,0K feb  8 16:34 .
drwxrwxr-x 4 raphy raphy 4,0K feb  8 16:29 ..
-rw-rw-r-- 1 raphy raphy  393 feb  8 16:34 CMakeLists.txt
-rw-rw-r-- 1 raphy raphy  937 feb  8 16:34 inference.cpp
-rw-rw-r-- 1 raphy raphy 342K feb  8 16:33 model.pt2
-rw-rw-r-- 1 raphy raphy 1,5K feb  8 16:32 model.py

How to make it work?

is only an example path. You need to find your corresponding local pytorch install path in order to make that work.

you can get this programatically via:

python -c "import os; import torch; print(os.path.join(os.path.dirname(torch.__file__), 'share/cmake'))"
2 Likes

@desertfire The previous error was due to my silly mistake. Solved. But now I get a different kind of error:

(.aoti) (base) raphy@raohy:~/AOTInductor/example/build$ CMAKE_PREFIX_PATH=/home/raphy/AOTInductor/.aoti/lib/python3.12/site-packages/torch/share/cmake cmake ..
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 14.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Warning at /home/raphy/AOTInductor/.aoti/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /home/raphy/AOTInductor/.aoti/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:121 (append_torchlib_if_found)
  CMakeLists.txt:4 (find_package)


-- Found Torch: /home/raphy/AOTInductor/.aoti/lib/python3.12/site-packages/torch/lib/libtorch.so
-- Configuring done (0.5s)
-- Generating done (0.0s)
-- Build files have been written to: /home/raphy/AOTInductor/example/build



(.aoti) (base) raphy@raohy:~/AOTInductor/example/build$  cmake --build . --config Release
[ 50%] Building CXX object CMakeFiles/aoti_example.dir/inference.cpp.o
[100%] Linking CXX executable aoti_example
[100%] Built target aoti_example


(.aoti) (base) raphy@raohy:~/AOTInductor/example/build$ ls
aoti_example  CMakeCache.txt  CMakeFiles  cmake_install.cmake  Makefile


(.aoti) (base) raphy@raohy:~/AOTInductor/example/build$ ./aoti_example 
terminate called after throwing an instance of 'std::runtime_error'
  what():  Failed to initialize zip archive: file open failed
Aborted (core dumped)

Is the initial CMake Warning related ? :

CMake Warning at /home/raphy/AOTInductor/.aoti/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /home/raphy/AOTInductor/.aoti/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:121 (append_torchlib_if_found)
  CMakeLists.txt:4 (find_package)

I searched info about this warning and found this recent, unsolved github issue :

How to make it work?

I don’t think that warning is relevant.

Did your Python model and C++ inference use the same backend, e.g. both CPU or both CUDA?

both CPU . I don’t even have a working GPU on the PC I’m using

If you’re using CPU-only, ExecuTorch which is the Edge focused runtime might be a good solution for you as well: Setting Up ExecuTorch — ExecuTorch 0.5 documentation
It is also a runtime for export-ed models that is more constrained (you don’t have access to full libtorch) but it has a much smaller footprint (~kB runtime).

1 Like

Hi!
I’ve been following the indications found here: Setting Up ExecuTorch — ExecuTorch 0.5 documentation

(executorch) raphy@raohy:~/executorch$ ./cmake-out/executor_runner --model_path ../example_files/add.pte
I 00:00:00.000294 executorch:executor_runner.cpp:82] Model file ../example_files/add.pte is loaded.
I 00:00:00.000308 executorch:executor_runner.cpp:91] Using method forward
I 00:00:00.000317 executorch:executor_runner.cpp:138] Setting up planned buffer 0, size 48.
I 00:00:00.000350 executorch:executor_runner.cpp:161] Method loaded.
I 00:00:00.000369 executorch:executor_runner.cpp:171] Inputs prepared.
I 00:00:00.000395 executorch:executor_runner.cpp:180] Model executed successfully.
I 00:00:00.000399 executorch:executor_runner.cpp:184] 1 outputs:
Output 0: tensor(sizes=[1], [2.])

But, now, I want to convert, and then, execute with ExecuteTorch, the following fine-tuned model : Fine-Tuning-BERT-for-Named-Entity-Recognition/BERTfineTunningFinal.ipynb at main · tozameerkhan/Fine-Tuning-BERT-for-Named-Entity-Recognition · GitHub .

I’ve already trained for fine-tuning the model, and saved the fine-tuned model as safetensors :

(.bftner) (base) raphy@raohy:~/BertFineTuningForNERPyTorch$ ls -lah
total 132K
drwxrwxr-x   7 raphy raphy 4,0K feb 19 14:49  .
drwxr-x--- 156 raphy raphy  12K feb 19 13:31  ..
-rw-rw-r--   1 raphy raphy 5,3K feb 18 11:50 '=0.26.0'
-rw-rw-r--   1 raphy raphy 8,3K feb 19 14:35  BERT-NER-ExportableToExecuteTorch.py
-rw-rw-r--   1 raphy raphy 8,2K feb 18 15:42  BERT-NER.py
drwxrwxr-x   6 raphy raphy 4,0K feb 18 11:41  .bftner
drwxrwxr-x   2 raphy raphy 4,0K feb 18 14:47  ner_model
drwxrwxr-x   5 raphy raphy 4,0K feb 18 17:14  results
drwxrwxr-x   2 raphy raphy 4,0K feb 18 14:47  tokenizer
(.bftner) (base) raphy@raohy:~/BertFineTuningForNERPyTorch$


(.bftner) (base) raphy@raohy:~/BertFineTuningForNERPyTorch$ ls -lah ./ner_model/
total 416M
drwxrwxr-x 2 raphy raphy 4,0K feb 18 14:47 .
drwxrwxr-x 7 raphy raphy 4,0K feb 19 15:24 ..
-rw-rw-r-- 1 raphy raphy  896 feb 18 18:47 config.json
-rw-rw-r-- 1 raphy raphy 416M feb 18 18:47 model.safetensors
(.bftner) (base) raphy@raohy:~/BertFineTuningForNERPyTorch$ ls -lah ./results/
total 20K
drwxrwxr-x 5 raphy raphy 4,0K feb 18 17:14 .
drwxrwxr-x 7 raphy raphy 4,0K feb 19 15:24 ..
drwxrwxr-x 2 raphy raphy 4,0K feb 18 17:14 checkpoint-1756
drwxrwxr-x 2 raphy raphy 4,0K feb 18 14:03 checkpoint-2634
drwxrwxr-x 2 raphy raphy 4,0K feb 18 14:47 checkpoint-3512
(.bftner) (base) raphy@raohy:~/BertFineTuningForNERPyTorch$ ls -lah ./tokenizer/
total 940K
drwxrwxr-x 2 raphy raphy 4,0K feb 18 14:47 .
drwxrwxr-x 7 raphy raphy 4,0K feb 19 15:24 ..
-rw-rw-r-- 1 raphy raphy  125 feb 18 18:47 special_tokens_map.json
-rw-rw-r-- 1 raphy raphy 1,2K feb 18 18:47 tokenizer_config.json
-rw-rw-r-- 1 raphy raphy 695K feb 18 18:47 tokenizer.json
-rw-rw-r-- 1 raphy raphy 227K feb 18 18:47 vocab.txt

I’m confused and lost. How should I proceed now in order to be able to convert this fine-tuned model into a model executable by ExecuTorch in a desktop environment?

1 Like

Hi @raphael10-collab thanks for giving ExecuTorch a try. The workflow is as follows:

  • Load your .safetensor into your BERT model (torch.nn.Module)
from safetensors.torch import load_model

load_model(model, "model.safetensors")
# Instead of model.load_state_dict(load_file("model.safetensors"))
  • Use torch.export() to export the torch.nn.Module into a ExportedProgram. You need to prepare the example input and dynamic shape information. See this wiki for instructions. Some code examples:
import torch

ep = torch.export.export(model, example_args)
  • Then you would need to use ExecuTorch APIs to export the ExportedProgram into .pte files. Please find the instruction here. You can look at the example in export_hf_util.py (may not directly apply to your case), and your code will look similar to this:
from executorch.exir import to_edge
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner

program = to_edge(ep).to_backend(XnnpackPartitioner()).to_executorch() 
filename = "model.pte"
with open(filename, "wb") as f:
      program.write_to_file(f)
  • Once you have the model.pte you can run it using ./cmake-out/executor_runner:
./cmake-out/executor_runner --model_path model.pte

Hope this helps. You can also create issues in GitHub · Where software is built or join our discord channel!