TL;DR
Folks from across Meta internal PyTorch Core, PyTorch Edge, and PyTorch Compiler teams collaborated to review a list of commonly used ATen operators and discussed whether each should be added to the core ATen operator set, or be decomposed by the core ATen decomposition table.
Our goal is to define a core operator set for the ATen library that fulfills the following criteria:
 The core ATen operator set can be used as a reference for which ATen ops should be handled by backends or compilers that consume models exported by PT2.
 The decompositions implied by the core ATen operator set are useful to the vast majority of usecases
 The vast majority of usecases will not want to decompose operators contained in the core ATen operator set
The purpose of having this core operator set is to help PyTorch communicate a stable set of operators that developers can expect their model to produce, therefore constraining the amount of operators that will have to be implemented in a custom runtime or be handled by a custom compiler backend to a manageable quantity. It also facilitates smoother integration with external ML frameworks, such as MLIR, XLA, and ONNX.
We invite you to review our decisions, and provide any feedback you may have!
Context: Why is a Core operator set needed?
As the PyTorch ecosystem grows, there is an increasing demand to convert PyTorch models into specialized representations that can run performantly and efficiently in specific environments. Specific examples today are TorchInductor and Executorch; both consume the same FX Graph representation of a model, but end up producing distinct programs to execute a model in their own distinct runtimes. As more backends are developed, it becomes critical for PyTorch to define a core operator set. This initiative is also a common request from neighboring ML frameworks, such as MLIR/XLA and ONNX so as to facilitate smoother integration with PyTorch.
There are over 3000 operators registered in the ATen library; this is a huge amount of operators that backend authors will have to worry about. To exacerbate the issue, many of these operators are redundant with each other, such as being a slight variant of another operator (e.g. inplace variants, out variants). However, by defining a core operator set, PyTorch is able to communicate a stable set of operators that developers can expect their model to produce, therefore constraining the amount of operators that will have to be implemented in a custom runtime or be handled by a custom compiler backend to a manageable quantity.
Defining the Core Operator Set
The core ATen operator set can be interpreted as the result of reducing the set of all operators registered to ATen through the process of decomposing operators. â€śDecomposingâ€ť an operator involves expressing it as a combination of other operators; such decompositions are currently defined in decomposition.py. During the export process, a default list of decompositions are used; this is known as the core ATen decomposition table. Thus, the core ATen operator set can be interpreted as a list of operators registered to ATen that are not further decomposed.
In general, we define an â€śoperator setâ€ť as the list of operators that will be produced when performing a model export with a specific â€śdecomposition tableâ€ť. Thus, the core ATen operator set is the list of operators that a model can contain when being exported with the core ATen decomposition table.
@SherlockNoMad had previously begun work defining the core ATen opset; the list of ops he identified as belonging in the core IR can be found here: IRs â€” PyTorch 2.0 documentation. This list was seeded by operators that appeared in 163 opensource models used as PT2 benchmarks from across torchbench, HuggingFace, and TIMM. At this point, the general criteria for determining whether a particular ATen operator can be â€śeasilyâ€ť decomposed to other ATen operators.
The results we are presenting now is a continuation of Sherlockâ€™s previous work. We follow the same overall process of manually inspecting ops that appear in a body of surveyed models. However, in this iteration we have taken on additional goals of:
 Definition and Codification of the criteria used to evaluate a particular op to determine if it should be part of ATenâ€™s core operator set
 Develop a democratized process where a diverse set of groups across PyTorch interested in this work (i.e. Inductor, Edge, Compiler) can provide input regarding what should/shouldnâ€™t be included in the core operator set, and the discussion and results are transparent to the broader PyTorch community
 Describe the process for evolving this operator set over time; this involves adding new operators to the core set as well as adapting the existing core operator set to changes in function schemas and new operators that are added to ATen
Our end goal is to develop a stable core ATen operator set that fulfills the following goals:
 The core ATen operator set can be used as a reference for which ATen ops should be handled by backends or compilers that consume models exported by PT2.
 The decompositions implied by the core ATen operator set are useful to the vast majority of usecases
 The vast majority of usecases will not want to decompose operators in the core ATen operator set
The core operator set represents all ATen operators that we have made an explicit decision to not be decomposed by the Core ATen decomposition table. There are operators that are not decomposed by the core decomposition table, but are also not a part of the Core ATen operator set; this means that these operators have not yet been evaluated or a decision has not yet been made for these operators.
Note that the intention is not for users to be locked in to using the core ATen decomposition table; backends are free to add or remove decompositions as they wish. The core operator set strives to be a common denominator across different usecases and contexts, but we encourage backends to further finetune the decomposition table, and therefore the resulting operator set, for their specific goals.
Results
Folks from across Meta internal PyTorch Core, PyTorch Edge, and PyTorch Compiler came together to review a list of commonly used ATen operators and discussed whether each should be added to the core ATen operator set, or be decomposed by the core ATen decomposition table.
The list of operators under consideration was obtained by extracting operators used by approximately 10,000 nn.Module
s tested in pytorchjitparitybench, which is â€śA test suite to measure TorchScript parity with PyTorch on many nn.Module
s crawled from popular GitHub projects.â€ť The idea was that by looking at operators which are explicitly used in models, we can target the most highimpact ATen operators.
The results of our decisions are summarized below.
Operators Added to the Core ATen Operator Set
For the operators listed below, [core aten] Add ops to core aten set by angelayi Â· Pull Request #107766 Â· pytorch/pytorch Â· GitHub adds the â€ścoreâ€ť tag to each in native_functions.yaml
. Since IRs â€” PyTorch 2.0 documentation is generated by searching through operators with the â€ścoreâ€ť tage in native_functions.yaml
, these operators will be eventually reflected in the web page as well.
Operator  Reason / Comment 

aten::adaptive_avg_pool1d  avg_pool ops are to be added to core. The adaptive version should be added as well, being a related operator. Decomposing to avg_pool2d involves calculation of kernel sizes based on the input tensor sizes, which in our view strays into the territory of operator implementation. 
aten::_adaptive_avg_pool3d  avg_pool ops are to be added to core. The adaptive version should be added as well, being a related operator. Decomposing to avg_pool2d involves calculation of kernel sizes based on the input tensor sizes, which in our view strays into the territory of operator implementation. 
aten::_cdist_forward  Decomposition is difficult/impossible; As additional supporting evidence, inductor lowers this 
aten::_embedding_bag  Embedding will be added to core. Embedding bag should also be added by extension; it should not be decomposed since the purpose of this op is to not instantiate intermediate embeddings. 
aten::_local_scalar_dense  Required since .item() lowers to this. 
aten::_native_batch_norm_legit_no_training  Added due to how common the op is. For performance reasons users may not want to decompose batch_norm op. As additional supporting evidence, batch_norm is also part of StableHLO. Note that other functional variants of batch normalization will be added to the core operator set as well. 
aten::_pdist_forward  Decomposition is difficult/impossible; As additional supporting evidence, Inductor lowers this 
aten::any  Decomposition is difficult/impossible 
aten::any.dim  This operator is a variant of any that only reduces a single dim; any and any.dim cannot be represented by each other. Unfortunately, any.dims (a variant which reduces across an arbitrary number of dimensions) does not exist so that any and any.dim can be decomposed to any.dims 
aten::avg_pool1d  avg_pool2d already part of core. There is no generic avg_pool operator so avg_pool1d and avg_pool3d should be added as well. 
aten::avg_pool3d  avg_pool2d already part of core. There is no generic avg_pool operator so avg_pool1d and avg_pool3d should be added as well. 
aten::bitwise_and.Scalar  The .Tensor operator variant already part of core 
aten::bitwise_or.Scalar  The .Tensor operator variant already part of core 
aten::bitwise_xor.Scalar  The .Tensor operator variant already part of core 
aten::ceil  This op also exists in ONNX 
aten::clamp.Tensor  Essentially the tensor variant of clamp, which is already in core 
aten::cumsum  This op also exists in ONNX 
aten::embedding  This op will be difficult to quantize if it is decomposed. 
aten::floor  Similar to ceil. 
aten::fmod.Scalar  The .Tensor operator variant already part of core 
aten::index_put  Decomposition is difficult/impossible 
aten::index.Tensor  Decomposition is difficult/impossible 
aten::logical_xor  Other logical_* ops are already part of core 
aten::mean  Decomposition is difficult/impossible 
aten::mean.dim  This operator is a variant of mean that only reduces a single dim; mean and mean.dim cannot be represented by each other. Unfortunately, mean.dims (a variant which reduces across an arbitrary number of dimensions) does not exist so that mean and mean.dim can be decomposed to any.dims 
aten::pixel_shuffle  Decomposition is difficult/impossible 
aten::prod  Reduction function similar to sum, which is already a part of core 
aten::prod.dim_int  Related to prod, but only reduces a single dim. Unfortunately prod.dims does not exist that can express both prod and prod.dim_int 
aten::rand  Cannot be decomposed; additionally, can be used in decompositions for various probability distribution generator operators 
aten::randperm  Decomposition is difficult/impossible 
aten::reflection_pad_1d  Decomposition is difficult/impossible 
aten::reflection_pad_2d  Decomposition is difficult/impossible 
aten::reflection_pad_3d  Decomposition is difficult/impossible 
aten::remainder.Scalar  The .Tensor operator variant already part of core 
aten::roll  Decomposition is difficult/impossible 
aten::round  Already part of ONNX. 
aten::scatter.src  Decomposition is difficult/impossible; Also a part of ONNX 
aten::scatter.value  Scalar variant of scatter.src 
aten::select_scatter  Reverse operation of select, which is already part of core (and part of StableHLO) 
aten::sort  Also exists in stableHLO; cannot be decomposed easily 
aten::split_with_sizes  Split is already a part of ONNX. split decomposes to this. 
aten::squeeze.dims  Already a part of ONNX 
aten::tan  Although this can be decomposed to sin(x)/cos(x), this op also exists ONNX 
aten::unsqueeze  Reverse operation of squeeze; it is also used in many decompositions; this is a part of ONNX as well 
aten::var.correction  Decomposition is difficult/impossible 
Operators for which decompositions will be added to the Core ATen decomposition table
Below contains operators which we have reviewed but have decided that they should be decomposed by default by the core ATen decomposition table. For some of these operators, a decomposition is already registered in the codebase, but it has not yet been added to the core ATen decomposition table.
Operator  Potential Decomp 

aten::_trilinear  (i1.unsqueeze(expand1)*i2.unsqueeze(expand2)*i2.unsqueeze(expand3)).sum(sumdim) 
aten::_unsafe_index.Tensor  index.Tensor 
aten::_unsafe_view  view() 
aten::all.dim  not(any.dim(x)) 
aten::arange.start  Decomp Exists 
aten::atan2  atan(input / other) 
aten::baddbmm  inductor has decomp already; make sure it is added to core list 
aten::bernoulli  transformation on rand() 
aten::bernoulli_.float  transformation on rand() 
aten::clamp_max  clamp(x, max=max) 
aten::clamp_min  clamp(x, min=min) 
aten::copy  _to_copy() 
aten::diagonal  decomp for diagonal exists 
aten::div.Tensor_mode  div + trunc or round 
aten::elu  min(alpha * exp(x)  1, 0) + max(x, 0) 
aten::empty_like  empty() 
aten::expm1  exp(x+1) 
aten::exponential_  transformation on rand() 
aten::floor_divide  floor(divide(x, y)) 
aten::floor_divide_.Tensor  floor(divide(x, y)) 
aten::full_like  full() 
aten::glu  Split + sigmoid + add 
aten::hann_window.periodic  pow(sin(pi * x) / N  1), 2) 
aten::lift_fresh  Gets decomposed to noop in Core ATen IR 
aten::log10  log(x)/log(10) 
aten::log1p  log1p (x) = log( 1 + x) 
aten::log2  log(x)/log(2) 
aten::max  return aten::amax(x), aten::argmax(x) 
aten::min  return aten::amin(x), aten::argmin(x) 
aten::mish_  x * tanh(softplus(x)) 
aten::normal.Tensor_float  transformation on rand() 
aten::normal.Tensor_Tensor  transformation on rand() 
aten::pow.Scalar  full + pow.Tensor_Tensor 
aten::rand_like  rand() 
aten::randint  rand() 
aten::randint.low  rand() 
aten::randn_like  rand() 
aten::resize  view() 
aten::split.Tensor  Decomp exists 
aten::squeeze  squeeze.dims 
aten::std.correction  sqrt(var(x)) 
aten::sum  sum.dim_IntList(x, ) 
aten::unbind  Decomp exists 
aten::uniform_  transformation on rand() 
aten::unsafe_split.Tensor  split.Tensor 
aten::var  var.correction 
aten::var_mean.correction  return mean(x), var(x) 
We invite you to review these decisions and let us know if there are any operators that you think are misclassified.
The framework we have been using to decide whether an operator should be part of the Core ATen operator set, or decomposed is described below. Note that these should be interpreted more as a set of â€śrule of thumbâ€ťs that guide decisions rather than dictating them.

The core operator set can only contain functional operators; therefore, inplace and out variant operators are excluded by default
 During the export process, inplace operators and out variant operators will be replaced by the functional equivalent due to functionalization
 e.g. aten::gelu_ will be functionalized into aten::gelu
 During the export process, inplace operators and out variant operators will be replaced by the functional equivalent due to functionalization

Core ATen decompositions should be fairly straightforward; a decomposition should not cross into the territory of being an outright implementation of the operator being decomposed. For example, if a decomposition for an operator introduces many additional ops into the graph or requires several computations to produce the decomposition then we should prefer to add the operator as a core operator
 Introducing many additional ops into the graph also has performance implications such as increasing memory read/writes during computation and needing to allocate more memory for intermediate tensors; for this reason, we prefer to keep decompositions simple
 Decompositions is not the appropriate layer for complex implementation logic for specific operators; thus if a decomposition is possible but complex, we should prefer to retain the original operator

Decomposition must be deterministic; if a model is exported once, the decompositions applied must be valid even if the properties of the input tensor (such as tensor sizes and data type) are varied
 As a general rule, it is fine for decompositions to use the rank of a tensor, since during the export process the rank of a tensor must be fixed.
 It is also fine for tensors to use the symbolic shape of the tensor for the decomposition

Whether an operator is included in other stable operator sets, such as ONNX or StableHLO, is also a factor in our decisions. The goal here is to maximize compatibility with external frameworks.
Following Up
Of course, even after this iteration of development, there are still many ATen operators that we have not yet looked at. There are a long tail of specialized operators registered in the ATen library, and exhaustively inspecting each one will be extremely time consuming and may not be practical. We are hoping that after this iteration, most ops that will show up in models in practice will have been accounted for. Nonetheless, it is imperative that we set up a framework for how to evolve this operator set going forward.
To consider additional operators to be added to the core operator set, this can be quite straightforward:
 Internally, we have a workchat group to coordinate discussion around the core ATen operator set; please let me know if you would like to be added. For internal use cases, discussion can occur in these forums.
 Externally, we can monitor open source issues for specific operators that PyTorch clients want to be considered for adding to the core operator set.
A more complex evolution case is when the function schema of existing ops changes, or when additional ops are registered that enable decompositions that were not possible before.
 When a function schema changes, this may result in additional ATen operators being added. If the schema of a core ATen op is changed, causing a variant of that op to be produced, then the added variant may need to be added as a core ATen operator as well. If the variant can be decomposed, then we should decompose it as part of the core ATen decomp table.
 We are working of an op schema versioning system for PT2 export, similar to the TorchScript op versioning system, which can be relevant here.
 When an additional op is added that enables certain decompositions, this may have the implication that some existing core ATen operators can now be decomposed. In these cases, the best choice may be to add the new operator as a core operator while retaining the existing core operators for the purposes of stability. The old core operators may then be periodically deprecated with significant PyTorch version updates.
 One similar case to consider is if someone wishes to modify an existing decomposition rule in the Core ATen decomposition table that produces a different set of operators. Even if the new set of operators that are produced are still all a part of the Core ATen operator set, models which trigger the decomposition will now use an alternative operator set. This may have implications for model deployment, where users may employ selective build to include only the minimal set of operators required to execute the model. Thus, reexporting the model may produce a model which contains operators that are not included in their build process. In this case, the best choice may be to use a similar approach; that is, to prefer keeping existing decompositions in the core ATen decomposition stable until significant Pytorch version updates.
As for evaluating additional operators, we plan to do this on an asneeded basis going forward. If there are operators that you would like us to consider adding to the core ATen operator set, please let us know!