Hello @jansel.
I’m wondering if I can make Triton codegen kernels for all of my operators without using extern_kernels
.
I modified the config as follows, but to no avail
torch._inductor.config.max_autotune_gemm_backends = "TRITON" # removed ATEN
torch._inductor.config.max_autotune = True
I get the error
File "/home/amodab01/anaconda3/envs/ml_training/lib/python3.11/site-packages/torch/_inductor/kernel/mm.py", line 156, in tuned_mm
return autotune_select_algorithm("mm", choices, [mat1, mat2], layout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amodab01/anaconda3/envs/ml_training/lib/python3.11/site-packages/torch/_inductor/select_algorithm.py", line 991, in autotune_select_algorithm
return _ALGORITHM_SELECTOR_CACHE(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amodab01/anaconda3/envs/ml_training/lib/python3.11/site-packages/torch/_inductor/select_algorithm.py", line 723, in __call__
raise RuntimeError(
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
LoweringException: RuntimeError: No choices to select, please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice.
target: aten.mm.default
args[0]: TensorBox(
ReinterpretView(
StorageBox(
InputBuffer(name='primals_3', layout=FixedLayout('cuda', torch.float32, size=[100], stride=[1]))
),
FixedLayout('cuda', torch.float32, size=[1, 100], stride=[100, 1]),
origins={view}
)
)
args[1]: TensorBox(
ReinterpretView(
StorageBox(
InputBuffer(name='primals_1', layout=FixedLayout('cuda', torch.float32, size=[100, 100], stride=[100, 1]))
),
FixedLayout('cuda', torch.float32, size=[100, 100], stride=[1, 100]),
origins={permute}
)
)
Any idea if I’m missing something?