Exporting a model containing backpropagation to onnx

DavidMinorNV · April 2, 2024, 6:15pm

Hello,

We’ve got a model containing a bunch of stuff like transformers, slicing, indexing using arrays, concatenation etc and, most awkwardly, a call to torch.autograd.grad(), which I’ve been trying to export to onnx for a while. I’ve made a few tickets about issues I’ve encountered along the way, many of which are linked from this one:

github.com/pytorch/pytorch

Problems differentiating through a transformer when exporting to onnx

opened 11:08PM - 05 Mar 24 UTC

DavidMinorNV

module: onnx triaged

### 🐛 Describe the bug I'm trying to export a model to onnx involving back pr…opagation, and I've run into a number of issues. I can work around some of them, but this one seems a bit tough. You can see the kind of problem I've encountered if you run this python code: ```python import torch class Model(torch.nn.Module): def __init__(self): super().__init__() self.latent_dim = 256 self.num_heads = 4 self.ff_size=1024 self.dropout=0.1 self.activation="gelu" self.num_layers = 4 root_seqTransEncoderLayer = torch.nn.TransformerEncoderLayer(d_model=self.latent_dim, nhead=self.num_heads, dim_feedforward=self.ff_size, dropout=self.dropout, activation=self.activation) self.root_seqTransEncoder = torch.nn.TransformerEncoder(root_seqTransEncoderLayer, num_layers=self.num_layers) def forward(self, inputs): xseq = inputs[0] xseq = xseq.detach().requires_grad_() with torch.enable_grad(): output = self.root_seqTransEncoder(xseq) loss = torch.sqrt(output).sum() return torch.autograd.grad([loss], [xseq])[0] mdl = Model() for p in mdl.parameters(): p.requires_grad_(False) print("export model") torch.onnx.export( Model(), [torch.randn([20, 2, 256]) ** 2], "modelthing.onnx", input_names=["xseq"], opset_version=17, output_names=["lossgrad"], verbose=True ) ``` This probably overlaps with https://github.com/pytorch/pytorch/issues/120820 and https://github.com/pytorch/pytorch/issues/120822 as this is basically what I was trying to do when I found those bugs. There are some things I can work around (eg the backward pass of a gelu unit is unsupported but I can implement that myself using a torch.onnx api), but some things look a lot harder to work around. This particular export fails with an error that it's trying to insert a parameter as a constant when it requires a gradient. I've turned requires_grad off for all the model's parameters though, so I think it's erroneously trying to insert an intermediate value as a constant like it's doing in https://github.com/pytorch/pytorch/issues/120820 (I found that bug while basically trying to strip this one down). Fixing that issue will probably reveal the layer norm problem I reported here https://github.com/pytorch/pytorch/issues/120822 , the fact that the backward pass for the Gelu nonlinearity isn't implemented (at least that's something I can work around) and probably some other stuff ### Versions Collecting environment information... PyTorch version: 2.2.1+cu118 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build PyTorch: N/A OS: Microsoft Windows 10 Enterprise GCC version: Could not collect Clang version: Could not collect CMake version: version 3.25.1 Libc version: N/A Python version: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.19045-SP0 Is CUDA available: True CUDA runtime version: 10.0.130 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090 Ti Nvidia driver version: 536.23 cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\cudnn_ops_train64_8.dll HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture=9 CurrentClockSpeed=3501 DeviceID=CPU0 Family=107 L2CacheSize=16384 L2CacheSpeed= Manufacturer=AuthenticAMD MaxClockSpeed=3501 Name=AMD Ryzen Threadripper PRO 3975WX 32-Cores ProcessorType=3 Revision=12544 Versions of relevant libraries: [pip3] functorch==2.0.0 [pip3] lovely-numpy==0.2.8 [pip3] numpy==1.24.3 [pip3] onnx==1.14.1 [pip3] onnx-graphsurgeon==0.3.27 [pip3] onnxconverter-common==1.13.0 [pip3] onnxruntime==1.15.1 [pip3] optree==0.10.0 [pip3] pytorch-lightning==1.4.2 [pip3] tf2onnx==1.16.0 [pip3] torch==2.2.1+cu118 [pip3] torch-cluster==1.6.1 [pip3] torch-fidelity==0.3.0 [pip3] torch-geometric==2.3.0 [pip3] torch-scatter==2.1.1 [pip3] torch-sparse==0.6.17 [pip3] torch-spline-conv==1.2.2 [pip3] torchaudio==2.2.1+cu118 [pip3] torchmetrics==0.6.0 [pip3] torchvision==0.17.1+cu118 [conda] blas 1.0 mkl [conda] cudatoolkit 11.6.0 hc0ea762_10 conda-forge [conda] libblas 3.9.0 16_win64_mkl conda-forge [conda] libcblas 3.9.0 16_win64_mkl conda-forge [conda] liblapack 3.9.0 16_win64_mkl conda-forge [conda] mkl 2022.1.0 h6a75c08_874 conda-forge [conda] mkl-include 2023.2.0 intel_49496 intel [conda] mkl-static 2023.2.0 intel_49496 intel [conda] numpy 1.24.2 py310hd02465a_0 conda-forge [conda] pytorch 1.12.0 py3.10_cuda11.6_cudnn8_0 pytorch [conda] pytorch-mutex 1.0 cuda pytorch

I was doing it via torch.onnx.export(), as there seems to be zero support for autograd stuff in torch.onnx.dynamo_export(). I’ve managed to brute force it and hack a pytorch version together so the torch.onnx.export() pathway works - I’m actually still not 100% clear if it was even meant to work in the first place…

I’d like to be able to do this with an official pytorch release though, and my hacked pytorch version isn’t really suitable for contributing to the project at the moment as 1) I don’t know if people want to do further work on torch.onnx.export() anyway 2) some of my workarounds are pretty hacky. I could probably put together a writeup of the issues I encountered and put a branch with my workarounds on github though.

There are definitely use cases for exporting models containing backprop to onnx, eg diffusion models with classifier guidance, optimizing latent codes at runtime etc, so it would be good to get proper support for it one way or another

Topic		Replies	Views
Minimal difference while converting .pt model to .onnx, using torch.export.onnx(), what is the reason? deployment	0	213	March 14, 2024
What is the correct, future-proof, way of deploying a pytorch python model in C++ for inference? deployment	12	581	February 25, 2025
How to get the backward graph while using torch.export? compiler	4	359	July 29, 2024
[BC Breaking] `torch.export.export_for_inference()` API is removed deployment	0	86	March 24, 2025
Backward module does not contains weight's gradients calculation FX	0	476	June 16, 2023

Exporting a model containing backpropagation to onnx

Related topics