TorchInductor: a PyTorch-native Compiler with Define-by-Run IR and Symbolic Shapes

I was going through the inductor code base and found the following:

The above code inside compile_fx_inner uses the cudagraphify method, which ultimately (using CUDAGraph Trees) tries to make a CUDAGraph out of the compiled_graph.current_callable.

As per the signature of cudagraphify the first argument is:

So, compiled_graph.current_callable is of type torch.fx.GraphModule.

Now my question is, what is this compiled_graph.current_callable: torch.fx.GraphModule?

In the compile_fx_inner function, I find compiled_graph defined as above.

Now gm is again a torch.fx.GraphModule.

So, questions here:

  1. What is the torch.fx.GraphModule instance that compile_fx_inner takes as input? Is it a fx_graph module passed down from TorchDynamo?
  2. What is the torch.fx.GraphModule instance that is passed to cudagraphify in compile_fx_inner? Is it an optimized GraphModule with triton kernels built into it?
  3. Next, is my following understanding correct:
    We parse PyTorch code to produce many FX Graphs (many because we might have graph breaks). These FX graphs from PyTorch code are GraphModules(), and TorchInductor makes triton code out of each FX Graph. So, in TorchInductor, we have a set of GraphModules (compiled to use triton), and then for each of these GraphModules, we decide whether to use CUDA Graphs.