TorchDynamo Update 8: TorchDynamo passed correctness check on 7k+ github models

yanboliang · June 24, 2022, 5:24pm

Recently we successfully ran TorchDynamo on 1K+ GitHub projects (a total of 7k+ models/test cases) collected using a crawling script. It is an important milestone as it demonstrated TorchDynamo as the most reliable OOB graph capture for PyTorch to date.

This post offers more details on this work, including the qualities of the graphs captured and the kind of problems fixed along the way.

TorchDynamo

If you are new to TorchDynamo, the links below will allow you to catch up on the new exploration. TorchDynamo generates FX graph from Python bytecode and various backends are integrated with TorchDynamo to complete inference/training of the model. In the future, with the help of a cost model, TorchDynamo could automate the selection of the best backend for each subgraph to achieve optimal performance.

How did we set up Dynamo’s 1K+ GitHub project evaluation?

Model selection criteria

Any Github project w/ 100+ stars and including “Pytorch” as a keyword.

Testing goal

No exceptions thrown out
Getting correct results

Testing data

Testing tool

Modified pytorch-jit-paritycheck project to make it support TorchDynamo vs. PyTorch parity check.

Running TorchDynamo in default mode – Test w/ Graph Break

We first ran the models using the default mode of TorchDynamo. Under the default mode, TorchDynamo graph capture may fall back to Eager for any Python constructs not supported by the compiler backend, potentially causing graph breaks. This mode has the best UX (completely OOB) but at the expense of sometimes capturing partial graphs instead of whole graphs.

Starting point – the 1st run on May 1st

The following table shows our first evaluation conducted on May 1st. It showed that TorchDynamo already achieved a pretty high success rate.

	total	passing	success rate
projects	1111	1035	93.2%
tests	7549	7399	98.0%

As we dug into the errors, we identified 7 distinct bugs that accounted for the 141 runtime errors and 4 distinct bugs that accounted for 9 correctness errors. The following list gave examples of the kind of bugs/issues we discovered.

End-point on June 10th

On June 10th, after fixing all the bugs, we hit the 100% goal!

	total	passing	success rate
projects	1112	1112	100.0%
tests	7560	7560	100.0%

Graph and Graph Break Characteristics

So how are the qualities of the graphs captured by TorchDynamo? The following stats shed some light:

Average unique graphs in each model: 1.5
The largest model by PyTorch operators: 4516
- test_asappresearch_flambe.HyperbolicMean is a hyperboloid model with 100 iterations.
Average PyTorch operators ran inside of TorchDynamo for each model: 33

We did observe that some models generated a lot of graph breaks. The following listed the top 10 models with the largest number of graphs captured. These models will be the focus of our future work to improve the full graph mode of Dynamo

Running Dynamo in the full graph mode (aka nopython=True)

We next evaluated Dynamo’s ability to capture full graphs using the flag nopython=True. In this mode, instead of breaking the graph and falling back to Eager when encountering an unsupported Python construct, Dynamo deliberately aborts and provides hints to users to fix the graph break. This mode is especially important for providing a smooth UX transition from the partial-graph capture (default, OOB, eager) to the full-graph capture (human-in-the-loop, export) using the same toolchain.

The 3rd round run - aborting on graph breaks

This table shows the coverage using the full-graph mode. As expected, the success rate dropped from 100%.

	total	passing	success rate
projects	1112	704	63.3%
tests	7561	6383	84.4%

For all the models passed without Python fallback (aka unique graphs per model is 1):

Average PyTorch operators captured in each model: 31
The largest model by PyTorch operators: 4516
- test_asappresearch_flambe.HyperbolicMean, which is the same as last run.

Future Work

Look at the graph break reasons; these are some of the top ones:

Non-const NNModule method
Call function in skip files, e.g., collections
Data dependency and control flow
Usage of non-Pytorch libraries, e.g., NumPy.

Some of them are required to respect, and others need to add support, so we need to categorize them and treat them differently:

Wrap these exceptions(e.g., unimplemented) with more readable exceptions to provide a better user experience.
Prioritize the top k graph break reasons and implement these features to avoid the graph break.
Open issue to track the graph break reason if it needs time to implement.

_sean_silva · June 25, 2022, 1:11am

This is amazing work! I look forward to using this in Torch-MLIR!

AMK · June 27, 2022, 1:11am

I’m coming from Julia where we have fast custom datastructures, some of which can be stack allocated or inlined into memory.

To what extent is this going to be possible with torch dynamo?

I’m looking for one or more of of mutable or immutable “struct like” things with dynamically dispatched methods, statically dispatched methods, inlined method.

Use cases range from fast graphs, writing to some state in an inner loop, tree structures etc

smth · June 30, 2022, 5:18am

the backend compilers aren’t really optimized to have custom data structures (forget fast lol).
They’re more like ML compilers, still very opinionated about what a Tensor is, how the memory layouts are, etc.

smth · June 30, 2022, 5:22am

I was wondering how we generated valid input shapes for all these models.

Apparently @jansel wrote a fairly nice deducer with a funny search that reads the error messages and tries to prune the space down.

Had a lot of fun reading through the code.

github.com

jansel/pytorch-jit-paritybench/blob/master/paritybench/deduce_parameters.py

import ast
import inspect
import itertools
import logging
import operator
import os
import re
import sys
import time
import traceback
from functools import reduce
from typing import Callable, List

import torch

from ._paritybench_helpers import _mock_layer

log = logging.getLogger(__name__)

This file has been truncated. show original

jansel · June 30, 2022, 4:19pm

In many case TorchDynamo will flatten python data structures to present a simple list of input tensors to the backend, then restore any output data structures and side effects after the backend graph runs.

This could make things faster. For example, if you construct lists/tuples/namedtuples/dicts and those don’t escape the bytecode being compiled, TorchDynamo will optimize them away. If you do many updates to a data structure, those will be collapsed into a single one.

AMK · June 30, 2022, 11:22pm

Ah, interesting.

Does this include custom classes, particularly if I don’t add fields at runtime (or if I want to preclude this from happening with some decorator or slots).

Would be cool if it could get memory layout help with type hints maybe with semantic enforcement. Also Jax’s PyTrees are really useful. And Chex’s dataclasses. There’s a lot in the space with Jax.

jansel · July 1, 2022, 1:03am

Yes, there is support for custom classes and even limited support for mutating attributes of custom classes. TorchDynamo will try to extract all the graph inputs from the custom classes, then queue up all the mutation/side effects and apply them after the graph executes. It definitely doesn’t handle everything, but for simple cases it should work.

PyTrees are awesome, we use them in functorch/AOT Autograd.

Topic		Replies	Views
TorchDynamo Update 5: Improved Capture & Bigger Graphs compiler	4	3972	April 14, 2022
TorchDynamo: An Experiment in Dynamic Python Bytecode Transformation compiler	7	17202	March 9, 2023
TorchDynamo Update 3: GPU Inference Edition compiler	12	6667	February 2, 2023
TorchDynamo Update: 1.48x geomean speedup on TorchBench CPU Inference compiler	0	5614	November 12, 2021
TorchDynamo Update 4: LazyTensor & nvFuser Experiments compiler	4	4810	February 9, 2024