PrimTorch: How backend/compiler writers interact with various IRs

Minerva_Yu · March 8, 2023, 3:56am

Hi,
I’m getting familiar with PT2.0 recently, by reading several docs, the following are some of my understandings about different IRs, I’m wondering if I was right or not:

I could get all the functions of pytorch by integrating with any IR, i.e. even if my backend integrate with only Core ATen IR or Prims IR or Inductor Loop-level IR, the backend could get all the “torch.xxx” operators.
For Core ATen IR is the subset of aten operators, the way I use it is the same with former, i.e. through “native_functions.yaml”. ( But how could I distinguish who are the “Core ATen” ops and who are the “ATen” ones? )
As for the Prims IR, I need to implement my compiler using Python language, and then dispatch the fused op to my hardware(?)

The last two questions are about how to integrate with PrimTorch IRs. I have no confidence in my understandings as no relevant reference have been found . If I missed any integration related docs or tutorials please let me know, thanks very much~

gilfree · March 12, 2023, 2:57pm

Hi @Minerva_Yu,

No answers, just hints - I am at the same situation as you are - trying to understand how should I integrate my compiler with the torch IR. I got some pointers here:

github.com/pytorch/pytorch

functorch.compile.aot_function fails on reshape in TorchRefsMode

opened 09:21AM - 05 Mar 23 UTC

closed 06:46AM - 21 Apr 23 UTC

gilfree

triaged module: primTorch

### 🐛 Describe the bug When using `TorchRefsMode`, aot_function cannot handle `…reshape`. I am not sure that my usage is correct, but my goal is to decompose a graph to prim ops, in order to later export outside pytorch. I tried the code below, and an internal assert is triggered, with request to open a bug: ```console RuntimeError: !schema.hasAnyAliasInfo() INTERNAL ASSERT FAILED at "../aten/src/ATen/FunctionalizeFallbackKernel.cpp":33, please report a bug to PyTorch. mutating and aliasing ops should all have codegen'd kernels ``` When not using fake tensor mode I get some error in a check for `is_quantized` at some place. If there is a "correct" way to get a prim graph, I would very much like to know what it is. Code: ```python from typing import List import torch from torch import Tensor from functorch.compile import aot_function from torch._subclasses.fake_tensor import FakeTensorMode from torch._prims.context import TorchRefsMode def func(scale: Tensor,shape): return torch.reshape(scale,shape) def custom_backend(gm: torch.fx.GraphModule, example_inputs: List[torch.Tensor]): print("custom backend called with FX graph:") gm.graph.print_tabular() return gm.forward scale=torch.ones(3) with FakeTensorMode(allow_non_fake_inputs=True): scale=scale.clone() with TorchRefsMode(strict=True): func = aot_function(func,fw_compiler=custom_backend) print(func(scale,shape=(-1,))) ``` Stack trace below: ```console Traceback (most recent call last): File "bug.py", line 39, in <module> print(func(scale,shape=(-1,))) File ".../torch/_functorch/aot_autograd.py", line 2643, in returned_function compiled_fn = create_aot_dispatcher_function( File ".../torch/_dynamo/utils.py", line 163, in time_wrapper r = func(*args, **kwargs) File ".../torch/_functorch/aot_autograd.py", line 2491, in create_aot_dispatcher_function compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config) File ".../torch/_functorch/aot_autograd.py", line 1802, in aot_wrapper_dedupe compiled_fn = compiler_fn(wrapped_flat_fn, deduped_flat_args, aot_config) File ".../torch/_functorch/aot_autograd.py", line 1278, in aot_dispatch_base _fw_metadata, _out = run_functionalized_fw_and_collect_metadata( File ".../torch/_functorch/aot_autograd.py", line 606, in inner flat_f_outs = f(*flat_f_args) File ".../torch/_functorch/aot_autograd.py", line 1800, in wrapped_flat_fn return flat_fn(*add_dupe_args(args)) File ".../torch/_functorch/aot_autograd.py", line 2623, in flat_fn tree_out = fn(*args, **kwargs) File "/homes/giladf/dlo/dlo2/docs/development/design/bug.py", line 23, in func return torch.reshape(scale,shape) File ".../torch/_prims/context.py", line 191, in __torch_function__ return func(*args, **kwargs) File ".../torch/_refs/__init__.py", line 3260, in reshape return _reshape_view_helper(a, *shape, allow_copy=True) File ".../torch/_refs/__init__.py", line 3156, in _reshape_view_helper return prims.view_of(a) File ".../torch/_ops.py", line 284, in __call__ return self._op(*args, **kwargs or {}) File ".../torch/_prims/context.py", line 172, in __torch_function__ return orig_func(*args, **kwargs) File ".../torch/_ops.py", line 284, in __call__ return self._op(*args, **kwargs or {}) File ".../torch/_prims/__init__.py", line 286, in _autograd_impl return backwards_not_supported(_prim)(*args, **kwargs) File ".../torch/_prims_common/wrappers.py", line 320, in _autograd_impl return redispatch_prim(args, kwargs) File ".../torch/_prims_common/wrappers.py", line 290, in redispatch_prim return prim(*args, **kwargs) File ".../torch/_ops.py", line 284, in __call__ return self._op(*args, **kwargs or {}) RuntimeError: !schema.hasAnyAliasInfo() INTERNAL ASSERT FAILED at "../aten/src/ATen/FunctionalizeFallbackKernel.cpp":33, please report a bug to PyTorch. mutating and aliasing ops should all have codegen'd kernels ``` ### Versions Collecting environment information... PyTorch version: 2.0.0+cu117 Is debug build: False CUDA used to build PyTorch: 11.7 ROCM used to build PyTorch: N/A OS: Ubuntu 18.04.6 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: 9.0.0 (tags/RELEASE_900/final) CMake version: version 3.25.0 Libc version: glibc-2.27 Python version: 3.10.6 (main, Aug 30 2022, 16:00:07) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.15.87-051587-generic-x86_64-with-glibc2.27 Is CUDA available: True CUDA runtime version: 11.7.64 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA A100-PCIE-40GB GPU 1: NVIDIA A100-PCIE-40GB Nvidia driver version: 525.78.01 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 43 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Vendor ID: AuthenticAMD Model name: AMD EPYC 7502 32-Core Processor CPU family: 23 Model: 49 Thread(s) per core: 2 Core(s) per socket: 32 Socket(s): 1 Stepping: 0 BogoMIPS: 5000.28 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es Virtualization: AMD-V L1d cache: 1 MiB (32 instances) L1i cache: 1 MiB (32 instances) L2 cache: 16 MiB (32 instances) L3 cache: 128 MiB (8 instances) NUMA node(s): 1 NUMA node0 CPU(s): 0-63 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] mypy==1.0.1 [pip3] mypy-extensions==1.0.0 [pip3] numpy==1.24.1 [pip3] pytorch-lightning==1.9.3 [pip3] pytorch-triton==2.0.0+d54c04abe2 [pip3] torch==2.0.0+cu117 [pip3] torchaudio==2.0.0+cu117 [pip3] torchmetrics==0.11.1 [pip3] torchvision==0.15.0+cu117 [conda] Could not collect cc @ezyang @mruberry @ngimel @Lezcano @peterbell10

You can play with the backend and get only torch.xxx with no aten, or only aten if you run the aot compiler. It seems that to get the core aten and prims - you can use what @ezyang pointed there - using proxy_tensor, make_fx & TorchRefsMode, but I am not sure how these levels relate to IRs — PyTorch master documentation.

Also see the post here: PrimTorch: could we get pure core-aten-ops or prims-ops after aot_autograd - #5 by SherlockNoMad, on how to get the core aten ir.

Seems also that you can define the decompositions you want, in python, and get varying levels of IRs from torch.xxx to “core atens and prims” - See the notebook I linked to in that issue.

Minerva_Yu · April 11, 2023, 4:14am

For the second question, some new discoveries:
ops in “native_functions.yaml” file with a tag “core” are the Core ATen IRs.

And by trying with the inplace variants torch.abs_(x), FX graph found is as follow:

...
abs_1: f16[2048] = torch.ops.aten.abs.default(arg0_1)
...
copy_: f16[2048] = torch.ops.aten.copy_.default(arg0_1, abs_1);  arg0_1 = None
...

Although I have not found where the abs_ ops is decomposed to abs and copy_ ops. Probably that is some magic with def register_inplace() function.

Topic		Replies	Views
PrimTorch: could we get pure core-aten-ops or prims-ops after aot_autograd compiler	6	4546	March 21, 2023
Set of ops for a backend to register hardware-backends	10	2306	March 10, 2023
PrimTorch: decompose ATen ops compiler	1	1371	March 23, 2023
Registering new compiler backend in Pytorch2.0 compiler	5	2219	March 20, 2023
Defining the Core ATen Opset FX	12	5622	August 21, 2024

PrimTorch: How backend/compiler writers interact with various IRs

Related topics