Actually, I want to generate triton kernels out of pytroch model when there is no cuda device, i.e. getting triton code out of torch.compile(m, backend=“inductor”)
when device="cpu"
. And then, to be able to get LLVM IR from these triton kernels using a custom non-GPU triton backend.