I have a few concerns regarding the new proposed Core ATen decompositions:
aten::_unsafe_index.Tensor
→index.Tensor
aten::_unsafe_index
was created as a hint to inductor that the indices originate from a decomposition rather than a user and as such it should be trusted. This means we don’t need to generate a tl.device_assert
call checking it’s in bounds. Decomposing it to index.Tensor
would result in worse performance.
aten::atan2
→atan(input / other)
This is incorrect as it doesn’t select the correct branch of the atan
function, e.g. atan2(-x, -y) != atan2(x, y)
and atan2(-x, y) != atan2(x, -y)
.
aten::diagonal
Decomposing views into as_strided
should be discouraged because there is far more semantic information in the aten::diagonal
call which inductor uses to generate much more efficient code.
aten::div.Tensor_mode
,aten::floor_divide
→ floor(divide(x, y))
This decomposition gives different results from python’s floor division. Currently inductor does this decomposition, but I don’t think it should be baked in for export.
aten::expm1
,aten::log10
,aten::log1p(x)
,aten::log2
These are not just convenience functions, they give more numerical precision so shouldn’t be decomposed.
aten::var_mean.correction
→return mean(x), var(x)
Inductor implements a single pass var_mean
which already computes the mean, and is currently not CSE’d with mean
. So this should result in worse performance.