@peterbell10 and @Lezcano thank you for your feedback. Based on your comments, we are making the following changes to the list of identified core operators.
-
Operators which cleanly map to hardware intrinsics will be promoted to the core operators. The same treatment is applied to operators where decomposing it will impact the numerical precision/stability of the output. In accordance with this, the following operators which were previously decomposed are now added to core:
aten::trunc
aten::expm1
aten::log10
aten::log1p
aten::log2
aten::atan2
-
div.Tensor_mode
anddiv.Scalar_mode
has been added as a “core” operator. The"trunc"
and"floor"
rounding modes are more complex to decompose than initially thought, as both need to handle floating point and integer data types separately, and “floor” in particular is quite complex to decompose as it needs to replicate Python’s floor division behaviour. The decomposition for thisoperator would be too similar to an outright implementation of the operator, which is why it is preferable to add it as a “core” operator.
Despite these changes, there are still some additional considerations I am working through.
- For
aten::diagonal
, as @peterbell10 called out decomposing intoas_strided
is not ideal. I am in favor of moving this to a core op as well, but need to confirm this decision internally. - We are still undecided on how to handle
var_mean.correction
. We can remove this decomposition for Inductor, but need to determine if there is a need to add the op as core so that the single pass algorithm can be acessed. - As a general point for the
.Scalar
variant of ops, should they be decomposed to usingfull
to construct tensor argument using the Scalar argument, then call the.Tensor
variant?