Hi @jansel , I wonder why inductor chooses Triton to generate CUDA kernels instead of other solutions like TVM / XLA?
@void-main I believe this question was answered earlier in this same thread.
Ah, my bad, missed the earlier discussion. Thanks for point that out @Lezcano !
So, if I understand correctly, the key point to not choose TVM is that Tensor IR requires more expert knowledge than Triton to get a good performance?
It seems the key point to choose triton is that it is focused on nvidia GPU optimizations and others(TVM/XLA) are not GPU bounded.
After digging on pytorch’s matmul triton template, I think it is rather genalized not bound to gpu. Hardware vendor can still port with triton and do their own transforms with this “tiled” language.
However, pytorch’s inductor implementation is indeed rather bound to gpu, which makes it harder to seperate the logic for inductor’s original role with it’s call to cuda apis.