Is there any documentation on the sequence of fx passes that are run by Inductor?
E.g., in the Writing Custom Backends Guide, dynamo passes a torch.fx.GraphModule to the backend. However, this graph is at a high level and some normalization needs to be done to decompose this IR into core aten ops and an IR more amenable for analysis / optimization.
I’ve found scattered bits around pattern_matcher that does this decomp and fx_passes in the inductor source directory that further lowers the normalized IR, but don’t have a clear picture of the entire flow from high level fx.GraphModule to optimized triton code.
Is there any documentation on the step by step flow that Inductor performs from the input graph from dynamo to optimized kernels / graph and how these map to the torch._inductor source directory?
compile_fx is the primary entrypoint, so if you are interested in the nitty gritty you can probably check this function out
To zoom out for an aerial view though, the structure is something like:
Pre-grad Passes
Pre-grad passes are FX graph optimizations TorchInductor runs before autograd. They transform the graph to improve performance through operator fusion, elimination of redundant operations, and graph simplification. Pre-grad passes run before functionalization and normalization, making them more challenging to write than post-grad passes since they must consider aliasing, mutation, and all possible argument schemas.
Joint-graph Passes
Joint-graph passes are FX graph optimizations TorchInductor runs on the combined forward and backward graph after AOT Autograd tracing. These passes operate on normalized ATen IR, making them easier to write than pre-grad passes since the IR is functional and standardized. Joint-graph passes focus on optimizations that benefit from seeing both forward and backward operations together.
Post-grad Passes
Post-grad passes are FX graph optimizations TorchInductor runs separately on the forward and backward graphs after gradient computation and functionalization. These passes operate on normalized, functional ATen IR, making pattern matching and transformations more straightforward than pre-grad passes. Post-grad passes focus on low-level optimizations that benefit from seeing the fully decomposed and functionalized representation.
If you would like, you can send me your email and I can send a more comprehensive (albeit WIP) document on inductor - we are hoping to improve documentation on this particular subject in the near term though!
1 Like
thanks for the info @Lucaskabela , I would as well be interested in the inductor doc if possible. My email is “vguerra at gmail”. Thanks!