Inductor CUDA Backend

Is there any documentation on the different CUDA kernel backends that inductor selects from (cutlass, triton, etc.)?

More specifically, for each backend would like to understand how modules / layers / ops are mapped to concrete kernel implementations. E.g., for triton, the autotuning / heuristics selection process, jit compilation, and stitching of the generated kernel back into the graph. For cutlass, the heuristics used for templated kernel gen.



Did you find any info on this?