Question regarding horizontal fusion

Pawl · December 3, 2025, 10:52pm

I am trying to get my head around Inductor. From inductor/fx_passes/ and inductor/codegen it seems like there is support for both vertical and horizontal fusion. By vertical fusion I guess, what is meant is kernel fusion where sequential ops are fused together.

For horizontal fusion, if I understand the design correctly, if you have op(A,B) and op(A,C) and can_fuse_horizontal=True, then you can do something like g(op(A, f(A,B))) where f and g are op specific transforms that allow you to fuse the op and split the result. Is this correct? Or am I misunderstanding?

And from the code is seems like CUDA and CUTLASS are not yet supported and only Triton is?

So if the codegen defaults to using the CUDA ops then horizontal fusion won’t be possible (for now), but if it chooses Triton than it potentially might be used? Is my understanding correct?

jansel · December 9, 2025, 9:22pm

Both horizontal and vertical fusion is supported for pointwise and reduction ops when the shapes are compatible. This is true for the Triton, C++, Halide, and Metal backends on all device types. This is done in the inductor scheduler.

There are heuristics to decide when these fusions are profitable, some of which can be controlled via configs such as:

github.com/pytorch/pytorch

torch/_inductor/config.py

06b03073b


      
          # fuse even in cases without common reads
          aggressive_fusion = False

and:

github.com/pytorch/pytorch

torch/_inductor/config.py

06b03073b


      
          # should we stop a fusion to allow better tiling?
          tiling_prevents_pointwise_fusion = True
          tiling_prevents_reduction_fusion = True

and many more (search that file for fusion).

For matmul/conv/etc we support epilogue fusion when in max-autotune mode and the Triton/CUTLASS template is selected. Plus prologue fusion in some quantization case.

Pawl · December 10, 2025, 1:48pm

Thank you, this answers all my questions.

Out of curiosity I also searched for the roadmap and if I’m understanding correctly, horizontal fusion of matmuls is a pending item. Very cool.

KR3.2 Explicit API for horizontal fusion, foreach_map, that includes grouped gemm. SOTA perf on grouped linear MOE.

Topic		Replies	Views
Disabling Codegen-Specific Fusions in TorchInductor for Per-Op Kernel Generation compiler	3	298	September 4, 2025
Helion Inductor Integration compiler	2	67	February 4, 2026
Custom cuda extension support in Inductor compiler	8	1144	March 7, 2024
Reverse Fusion of Node Pairs in Scheduler compiler	0	210	June 14, 2024
Different points at which fusion occurs? FX	8	730	May 21, 2024

Question regarding horizontal fusion

Related topics