Is there any documentation on the sequence of fx passes that are run by Inductor?
E.g., in the Writing Custom Backends Guide, dynamo
passes a torch.fx.GraphModule
to the backend. However, this graph is at a high level and some normalization needs to be done to decompose this IR into core aten ops and an IR more amenable for analysis / optimization.
I’ve found scattered bits around pattern_matcher that does this decomp and fx_passes in the inductor
source directory that further lowers the normalized IR, but don’t have a clear picture of the entire flow from high level fx.GraphModule
to optimized triton code.
Is there any documentation on the step by step flow that Inductor performs from the input graph from dynamo
to optimized kernels / graph and how these map to the torch._inductor
source directory?