Hi @jansel , I wonder why inductor chooses Triton to generate CUDA kernels instead of other solutions like TVM / XLA?
@void-main I believe this question was answered earlier in this same thread.
Ah, my bad, missed the earlier discussion. Thanks for point that out @Lezcano !
So, if I understand correctly, the key point to not choose TVM is that Tensor IR requires more expert knowledge than Triton to get a good performance?
It seems the key point to choose triton is that it is focused on nvidia GPU optimizations and others(TVM/XLA) are not GPU bounded.
After digging on pytorch’s matmul triton template, I think it is rather genalized not bound to gpu. Hardware vendor can still port with triton and do their own transforms with this “tiled” language.
However, pytorch’s inductor implementation is indeed rather bound to gpu, which makes it harder to seperate the logic for inductor’s original role with it’s call to cuda apis.
Does the fact that dynamo is only for Linux and Mac mean that contributing to inductor is not possible on Windows?
One easy way to contribute on Windows would be to try that, fix any bugs (if any), and write instructions that other Windows users could follow.
If you want to try to add Windows support without WSL, we also welcome pull requests to add support for Windows.
It works on WSL, yes. I’ve been using the pytorch nightly build and it works fine.