Hi, team, I’m trying to make Dynamo more understandable to users, so that they can debug and see if Dynamo is doing what they want.
I opened this discussion to document the progress and the problems I met now.
The first step is to expose cache entries to users, which has been done in PR1 and PR2. Now users can use torch._dynamo.eval_frame._debug_get_cache_entry_list
to retrieve cache entries from functions optimized by Dynamo. The doc shows the usage example.
The second step is to explain the artifacts generated by Dynamo to users. They mainly exist in python bytecode, which are difficult for users to read and understand.
There are four categories of artifacts generated by Dynamo: guard
/code
/compiled partial graph
/resume functions
. Let’s explain them step by step.
-
Guards: Dynamo actually has the source code for guards, so we can expose the source code of guards to users, so that they don’t have to read bytecodes. This can be achieved by an ongoing PR. After that, we can use
guard.src
to show its source code to users. -
Compiled partial graphs: We focus on understanding and debugging Dynamo, so we can use an “eager” backend, then compiled partial graphs are generated by
torch.fx
, with readable source code. -
Resume functions: these functions are parts of original functions, and maybe we can just print the original source code to users, and tell them where to resume for this function.
-
Compiled code: this is the function to assemble compiled partial graphs and resume functions into the final code. I assume that this function’s bytecode is not very complicated, and maybe my simple decompiler will work.
The final goal might look like this:
from typing import List
import torch
from torch import _dynamo as torchdynamo
def my_compiler(gm: torch.fx.GraphModule, example_inputs: List[torch.Tensor]):
print("my_compiler() called with FX graph:")
gm.graph.print_tabular()
return gm.forward # return a python callable
@torchdynamo.optimize(my_compiler)
def toy_example(a, b):
x = a / (torch.abs(a) + 1)
if b.sum() < 0:
b = b * -1
return x * b
for _ in range(100):
toy_example(torch.randn(10), torch.randn(10))
After we compile the function, we can describe what Dynamo does for users:
torchdynamo.describe(toy_example)
The desired output:
This is a function optimized by Dynamo. It has {n} cache entries.
Cache Entry 1:
Guard:
{Guard Code}
Code:
{Decompiled Source Code}
There are {m} subgraphs in the function:
SubGraph 1: __compiled_fn_{i}
Source code of subgraph function 1:
There are {p} resume functions:
Resume function 1: __resume_at_{offset}_{j}
def toy_example(a, b):
x = a / (torch.abs(a) + 1)
if b.sum() < 0: <== resume in this line
b = b * -1
return x * b
I’m seeking feedback from this forum to see if the proposal is valuable and worth investing.