Recently we successfully ran TorchDynamo on 1K+ GitHub projects (a total of 7k+ models/test cases) collected using a crawling script. It is an important milestone as it demonstrated TorchDynamo as the most reliable OOB graph capture for PyTorch to date.
This post offers more details on this work, including the qualities of the graphs captured and the kind of problems fixed along the way.
TorchDynamo
If you are new to TorchDynamo, the links below will allow you to catch up on the new exploration. TorchDynamo generates FX graph from Python bytecode and various backends are integrated with TorchDynamo to complete inference/training of the model. In the future, with the help of a cost model, TorchDynamo could automate the selection of the best backend for each subgraph to achieve optimal performance.
- Update 1: An Experiment in Dynamic Python Bytecode Transformation
- Update 2: 1.48x Geomean Speedup on TorchBench CPU Inference
- Update 3: GPU Inference Edition
- Update 4: Lazy Tensors & nvFuser Experiments
- Update 5: Improved Capture and Bigger Graphs
- Update 6: Training support with AOTAutograd
- Update 7: Inference with FX2TRT
How did we set up Dynamo’s 1K+ GitHub project evaluation?
Model selection criteria
- Any Github project w/ 100+ stars and including “Pytorch” as a keyword.
Testing goal
- No exceptions thrown out
- Getting correct results
Testing data
Testing tool
- Modified pytorch-jit-paritycheck project to make it support TorchDynamo vs. PyTorch parity check.
Running TorchDynamo in default mode – Test w/ Graph Break
We first ran the models using the default mode of TorchDynamo. Under the default mode, TorchDynamo graph capture may fall back to Eager for any Python constructs not supported by the compiler backend, potentially causing graph breaks. This mode has the best UX (completely OOB) but at the expense of sometimes capturing partial graphs instead of whole graphs.
Starting point – the 1st run on May 1st
The following table shows our first evaluation conducted on May 1st. It showed that TorchDynamo already achieved a pretty high success rate.
total | passing | success rate | |
---|---|---|---|
projects | 1111 | 1035 | 93.2% |
tests | 7549 | 7399 | 98.0% |
As we dug into the errors, we identified 7 distinct bugs that accounted for the 141 runtime errors and 4 distinct bugs that accounted for 9 correctness errors. The following list gave examples of the kind of bugs/issues we discovered.
- X.new ​​acts differently with a Size input versus a tuple input
- Prioritize class method if there is duplicated attribute name
- List slice bug
- Dict: TorchDynamo was using sort(dict.keys) to keep order
- Inline translator bug: list/dict mutation
- codegen for variable with side effects
- Variables mutation and tracking
End-point on June 10th
On June 10th, after fixing all the bugs, we hit the 100% goal!
total | passing | success rate | |
---|---|---|---|
projects | 1112 | 1112 | 100.0% |
tests | 7560 | 7560 | 100.0% |
Graph and Graph Break Characteristics
So how are the qualities of the graphs captured by TorchDynamo? The following stats shed some light:
- Average unique graphs in each model: 1.5
- The largest model by PyTorch operators: 4516
- test_asappresearch_flambe.HyperbolicMean is a hyperboloid model with 100 iterations.
- Average PyTorch operators ran inside of TorchDynamo for each model: 33
We did observe that some models generated a lot of graph breaks. The following listed the top 10 models with the largest number of graphs captured. These models will be the focus of our future work to improve the full graph mode of Dynamo
- 84: test_jankrepl_deepdow.NCO
- 78: test_OniroAI_MonoDepth_PyTorch.Resnet50_md
- 71: test_666DZY666_model_compression.Net
- 69: test_itayhubara_BinaryNet_pytorch.VGG_Cifar10
- 66: test_lenscloth_RKD.AllPairs
- 64: test_sshuair_torchsat.DenseNet
- 64: test_rahulvigneswaran_Lottery_Ticket_Hypothesis_in_Pytorch.DenseNet
- 61: test_iwasaki_kenta_keita.TCML
- 56: test_weiaicunzai_pytorch_cifar100.ShuffleNet
- 53: test_gpleiss_efficient_densenet_pytorch.DenseNet
Running Dynamo in the full graph mode (aka nopython=True)
We next evaluated Dynamo’s ability to capture full graphs using the flag nopython=True. In this mode, instead of breaking the graph and falling back to Eager when encountering an unsupported Python construct, Dynamo deliberately aborts and provides hints to users to fix the graph break. This mode is especially important for providing a smooth UX transition from the partial-graph capture (default, OOB, eager) to the full-graph capture (human-in-the-loop, export) using the same toolchain.
The 3rd round run - aborting on graph breaks
This table shows the coverage using the full-graph mode. As expected, the success rate dropped from 100%.
total | passing | success rate | |
---|---|---|---|
projects | 1112 | 704 | 63.3% |
tests | 7561 | 6383 | 84.4% |
For all the models passed without Python fallback (aka unique graphs per model is 1):
- Average PyTorch operators captured in each model: 31
- The largest model by PyTorch operators: 4516
- test_asappresearch_flambe.HyperbolicMean, which is the same as last run.
Future Work
Look at the graph break reasons; these are some of the top ones:
- Non-const NNModule method
- Call function in skip files, e.g., collections
- Data dependency and control flow
- Usage of non-Pytorch libraries, e.g., NumPy.
Some of them are required to respect, and others need to add support, so we need to categorize them and treat them differently:
- Wrap these exceptions(e.g., unimplemented) with more readable exceptions to provide a better user experience.
- Prioritize the top k graph break reasons and implement these features to avoid the graph break.
- Open issue to track the graph break reason if it needs time to implement.