With Brian Hirsh, Meghan Lele, Heitor Schueroff, Maksim Levental, Natalia Gimelshein, Zheng Yan, Jeffrey Wan, Rong Rong, Basil Hosmer, Joel Schlosser, Alex Suhan, Kushashwa Ravi Shrimali, Yukio Siraichi, Ivan Yashchuk, Kshiteej K aka kshitij12345, Xue Haotian aka Kiyosora, Eddie Y aka eqy, Freey0, Yuanqiang Liu aka qingyunqu, Elton Leander Pinto aka 1ntEgr8, Xiong Wei aka RockingJavaBean
Structured kernels are a new way of writing kernels in PyTorch which separate meta computation (computing the output dtype and shape) from the actual implementation of a kernel (computing the data in the tensor). From a developer perspective, structured kernels make it easier to write the functional, inplace and out variants of kernels in a consistent manner (add, add_ and add_out); from a user perspective, structured kernels are used to implement meta tensors, which are tensors which act and behave like normal tensors, except they bypass all computation. Meta tensors have a variety of use cases: while they ordinarily are used as an API to run shape inference on PyTorch operators (mlir-npcomp, functorch) they also can be used to represent and manipulate models with large parameters without actually materializing those parameters (e.g., Skipping Module Parameter Initialization — PyTorch Tutorials 1.9.0+cu102 documentation).
Lookback
As of today (adb85b32d3), there are 187 distinct operators ported to structured (468 when including out/inplace overloads); this is out of 508 operators which are eligible to be structured (1179 including inplace/out overloads), with 971 other operators that cannot obviously be made structured (e.g., the operator only has a functional variant and not an out variant, or the operator is composite). Operator support for meta tensors extends beyond structured kernels support, since composite kernels automatically support meta tensors (assuming the underlying operators they call support meta tensors). In practice, these important classes of operators are also supported with meta tensors:
-
Factory functions (including random functions)
-
View functions
-
Inplace functions (we’ve explicitly hardcoded these to noop with meta tensors)
-
Serialization
What this means is that if you hit a case where an operator does not support meta tensors, it should be because a specific operator does not work, and you should let us know about it at the tracking issue. Some current notable omissions:
-
sum/all/mean/max
-
masked_select/index_select/take
-
cat
-
cdist
-
EmbeddingBag/BatchNorm/ReLU/BCELoss/MaxPool3d
Some operators were blocked due to lack of infrastructural support; Meghan Lelerecently added infrastructural improvements to unblock structured kernels for optional tensors and optional scalars.
Lookahead
We have a few folks at Quansight (Yukio and Kushashwa) who plan to continue to work on structured kernels support, and composability is continuing to support these efforts; however, we do not have any roadmap items for larger feature development or pushes on structured kernels. We are hoping that at this leisurely pace (with some occasional tactical fixes) we will hit enough coverage to handle upcoming use cases. There are some blockers for making certain classes of kernels structured (currently, the most notable examples are operators that take lists of tensors). However, it is always possible to manually unblock a kernel for meta tensor support by writing the meta implementation by hand (resulting in a modest amount of code duplication), so as long as you only need a few operators we are confident they can be unblocked when necessary.
This project wouldn’t have been possible without everyone who has contributed to making kernels structured. Thank you!