I’m exploring the differences in graph capturing behavior between using torchdynamo and torch.fx symbolic_trace. Specifically, I’ve noticed that when tracing models using Dynamo, the get_attr
nodes often get automatically converted to placeholder
nodes.
Here’s a simple runnable demo that illustrates what I’m observing:
import torch
from torch.fx import GraphModule, Tracer, Graph
from torch.export import export
class CustomModel(torch.nn.Module):
def __init__(self, hidden_size):
super().__init__()
self.w1 = torch.nn.Parameter(torch.empty(hidden_size, hidden_size))
def forward(self, x):
x = torch.mm(x, self.w1)
# x = gelu(x) # Uncomment for non-linear operations
x = x.sum()
return (x,)
if __name__ == "__main__":
torch.set_default_dtype(torch.bfloat16)
with torch.device("meta"):
hidden_size = 1024
model = CustomModel(hidden_size)
inp = torch.zeros(2, hidden_size, requires_grad=True)
tracer = Tracer()
graph = tracer.trace(model)
graph.print_tabular()
exported_program: torch.export.ExportedProgram = export(model, args=(inp,))
gm = exported_program.graph_module
gm.graph.print_tabular()
The output using torch.fx
directly vs. using export
(which I presume uses Dynamo internally) shows different behaviors. In the FX trace, parameters are retained as get_attr
, whereas in the Dynamo-based trace, they are converted to placeholder
.
Output using FX:
opcode name target args kwargs
------------- ------ ----------------------------------------------------- ----------- --------
placeholder x x () {}
get_attr w1 w1 () {}
call_function mm <built-in method mm of type object at 0x7f661c873500> (x, w1) {}
call_method sum_1 sum (mm,) {}
output output output ((sum_1,),) {}
Output using Dynamo:
opcode name target args kwargs
------------- ------ ---------------- ----------- --------
placeholder p_w1 p_w1 () {}
placeholder x x () {}
call_function mm aten.mm.default (x, p_w1) {}
call_function sum_1 aten.sum.default (mm,) {}
output output output ((sum_1,),) {}
I am curious why this happens, and if there’s a way to control or prevent this behavior when using Dynamo. Any insights or recommendations on how to handle this discrepancy would be greatly appreciated.