TorchInductor: a PyTorch-native Compiler with Define-by-Run IR and Symbolic Shapes

Abhishekghosh1998 · March 25, 2024, 4:35pm

I was going through the inductor code base and found the following:

pytorch/pytorch/blob/0465a90b00fa9f308a81e3a054f673977325ae35/torch/_inductor/compile_fx.py#L525C1-L534C14


      
          compiled_graph.current_callable = cudagraphify(
              compiled_graph.current_callable,
              example_inputs,
              static_input_idxs=range(num_fixed),
              device_index=next(iter(compiled_graph.device_idxs)),
              stack_traces=stack_traces,
              is_backward=is_backward,
              is_inference=is_inference,
              constants=tuple(compiled_graph.constants.values()),
          )

The above code inside compile_fx_inner uses the cudagraphify method, which ultimately (using CUDAGraph Trees) tries to make a CUDAGraph out of the compiled_graph.current_callable.

As per the signature of cudagraphify the first argument is:

github.com

pytorch/pytorch/blob/0465a90b00fa9f308a81e3a054f673977325ae35/torch/_inductor/compile_fx.py#L820C1-L821C33


      
          def cudagraphify(
              model: torch.fx.GraphModule,

So, compiled_graph.current_callable is of type torch.fx.GraphModule.

Now my question is, what is this compiled_graph.current_callable: torch.fx.GraphModule?

github.com

pytorch/pytorch/blob/0465a90b00fa9f308a81e3a054f673977325ae35/torch/_inductor/compile_fx.py#L453C1-L455C10


      
          compiled_graph = fx_codegen_and_compile(
              gm, example_inputs, **graph_kwargs  # type: ignore[arg-type]
          )

In the compile_fx_inner function, I find compiled_graph defined as above.

github.com

pytorch/pytorch/blob/0465a90b00fa9f308a81e3a054f673977325ae35/torch/_inductor/compile_fx.py#L588C1-L589C30


      
          def fx_codegen_and_compile(
              gm: torch.fx.GraphModule,

Now gm is again a torch.fx.GraphModule.

So, questions here:

What is the torch.fx.GraphModule instance that compile_fx_inner takes as input? Is it a fx_graph module passed down from TorchDynamo?
What is the torch.fx.GraphModule instance that is passed to cudagraphify in compile_fx_inner? Is it an optimized GraphModule with triton kernels built into it?
Next, is my following understanding correct:
We parse PyTorch code to produce many FX Graphs (many because we might have graph breaks). These FX graphs from PyTorch code are GraphModules(), and TorchInductor makes triton code out of each FX Graph. So, in TorchInductor, we have a set of GraphModules (compiled to use triton), and then for each of these GraphModules, we decide whether to use CUDA Graphs.

Topic		Replies	Views
How to Access Triton Kernels from TorchInductor when running on CPU? compiler	1	1146	August 12, 2024
When does the inductor code run? compiler	5	905	May 15, 2024
TorchInductor Update 6: CPU backend performance update and new features in PyTorch 2.1 compiler	0	2222	September 22, 2023
TorchInductor Update 4: CPU backend started to show promising performance boost compiler	1	3103	November 25, 2022
Pytorch to Triton for Non-GPU Devices compiler	7	2120	August 30, 2024

TorchInductor: a PyTorch-native Compiler with Define-by-Run IR and Symbolic Shapes

Related topics