Set of ops for a backend to register

Sujoy_Saraswati · February 2, 2023, 1:00pm

Hi,
I have a query on the set of ops that a backend must register with PyTorch 2.0. As per IRs — PyTorch master documentation, PyTorch 2.0 offers two set of IRs for backends to interface with: Core Aten IR and Prims IR.
The Core Aten IR is fully functional and doesn’t have inplace or _out variations. However, does PyTorch 2.0 decompose torch ops into the Core Aten IR ops only when a python frame is passed via torch.compile? If the execution is not via torch.compile, or if torch.compile falls back to eager mode of execution, is there a way for the backends to still get only Core Aten IR ops? If this isn’t possible, then should a backend register other ops outside of Core Aten IR, including inplace and _out variations, to support eager execution?
Regards,
Sujoy

wconstab · February 7, 2023, 9:10pm

@SherlockNoMad can you comment here?

SherlockNoMad · February 8, 2023, 5:10am

At the moment, PT2 only run decomposition in the compilation path. I don’t have a recommended way to run decomposition in eager mode. If you think this would be useful feature, please raise a feature request via github issue.

In torch.compile path, computations fall back to eager would only happens when there is dynamo graph break.

Sujoy_Saraswati · February 8, 2023, 5:30am

Thanks for the clarification. My understanding is that it is not enough for a backend to enable only core aten IR / prim IR ops, as the eager mode execution is still a valid user option, and even torch.compile can fallback to eager for graph breaks, like you mentioned.

Given this, the set of ops that needs to be implemented on a backend with 2.0 is still the entire aten op set. Is this a valid assumption?

byronyi · February 8, 2023, 12:01pm

Plus the prims set, if you would like your user to torch.compile at some point.

Chillee · February 8, 2023, 7:50pm

It’s not that hard to run decompositions in “eager mode”, so if you support core Aten IR/Prim IR it would be pretty easy to make it run in eager mode (which is essentially just a graph with a single element).

Chillee · February 8, 2023, 7:51pm

You only need to support whatever prims/aten operators that make up operators you’re decomposing. For example, there are many prims that Inductor doesn’t support.

Sujoy_Saraswati · February 9, 2023, 7:37am

Will the decomposition be done in the framework before the ops are dispatched to backend, or should the backend handle it, maybe via a torch dispatch based decomposition?

There is also a potential performance impact, as the decomposition in torch.compile happens only during the graph compile time but in the eager flow, the decomposition will happen during every op execution.

Chillee · February 9, 2023, 9:21am

should the backend handle it, maybe via a torch dispatch based decomposition?

Yeah that’s one reasonable option. Another option is to register a per-op kernel that’s precompiled based off of the decomposition. This could also resolve the issue you mentioned with “decomposition happening during every op execution”.

See https://github.com/pytorch/pytorch/pull/75905

byronyi · March 1, 2023, 1:25am

That could work only because existing CUDA backend supports the entire ATen op set in eager mode. It still seems to me that alternative backend needs to support the entire op set for both ATen and Prims (neither stabilized), and the number of ops backends to register strictly goes up from PT 1.x to 2.0.

Indeed, we see XLA and MPS grinding their own decompositions. If backends still need to support the entire ATen op set anyway, I don’t see any incentives for them to decompose first to Prims.

SherlockNoMad · March 10, 2023, 8:23am

Here’s an example of running decomposition in eager mode

github.com/pytorch/pytorch

Example of running Decomposition in eager mode

pytorch:gh/SherlockNoMad/113/base ← pytorch:gh/SherlockNoMad/113/head

opened 11:36PM - 09 Mar 23 UTC

SherlockNoMad

+34 -0

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * _…_->__ #96466 enable_eager_decom=False ``` Running aten.add_.Tensor Running aten.empty.memory_format Running aten.native_batch_norm.default ``` enable_eager_decom=True ``` Running aten.add_.Tensor Running aten.empty.memory_format Decomposing aten.native_batch_norm.default Running aten.to.dtype Running aten.var_mean.correction Running aten.add.Tensor Running aten.rsqrt.default Running aten.sub.Tensor Running aten.mul.Tensor Running aten.squeeze.dims Running aten.squeeze.dims Running aten.mul.Tensor Running aten.mul.Tensor Running aten.add.Tensor Running aten.copy_.default Running aten.squeeze.dims Running aten.mul.Tensor Running aten.mul.Tensor Running aten.mul.Tensor Running aten.add.Tensor Running aten.copy_.default Running aten.unsqueeze.default Running aten.unsqueeze.default Running aten.unsqueeze.default Running aten.unsqueeze.default Running aten.mul.Tensor Running aten.add.Tensor Running aten.to.dtype Running aten.to.dtype Running aten.to.dtype ```

Topic		Replies	Views
PrimTorch: How backend/compiler writers interact with various IRs compiler	2	2213	April 11, 2023
PrimTorch: decompose ATen ops compiler	1	1364	March 23, 2023
Why PyTorch does not need a new standardized operator set compiler	3	1208	July 3, 2024
PrimTorch: could we get pure core-aten-ops or prims-ops after aot_autograd compiler	6	4461	March 21, 2023
[RFC] New Python operator registration API	10	1252	January 31, 2024

Set of ops for a backend to register

Related topics