NNC walkthrough: how PyTorch ops get fused

ZolotukhinM · October 31, 2021, 9:15pm

Right, we do not support in-place. TorchScript, in fact, has a pass that replaces in-place ops with their out-of-place equivalents, but it is not run in the default pipeline (because it can easily be a pessimization rather than optimization). NNC itself doesn’t currently support inplace ops either, but we’re considering changing that.

I also wonder why it changes the expression (a + b) * c into (a * c) + (b * c)

Hm, I’m surprised by this as well If you’re curious to investigate this, I’d suggest running the test with PYTORCH_JIT_LOG_LEVEL=">>kernel:>>cuda_codegen" to see where this happens.

Btw, maybe that would be interesting to you too: there is an API to invoke NNC on a graph directly, without going through the fuser pass (in a case when the graph has some unsupported ops it will fail). It’s not a public API, but maybe you’ll find it convenient for your experiments. Here is an example of how it’s used (and I can provide more if needed):

github.com

pytorch/pytorch/blob/main/test/test_tensorexpr_pybind.py#L267-L277


      
          
                  device, size = "cpu", (3, 4)
                  x = torch.rand(size, device=device)
          
                  graph_str = """
          graph(%a.1 : Float(3, 4, strides=[4, 1], requires_grad=0, device=cpu)):
            %3 : Float(4, 3, strides=[4, 1], requires_grad=0, device=cpu) = aten::t(%a.1)
            return (%3)
                  """
                  graph = torch._C.parse_ir(graph_str)

Topic		Replies	Views
NNC Per-Operator Benchmarks (on CPU) nnc	5	1050	January 27, 2021
Single-op fusion benchmarking compiler	0	837	February 4, 2021
Python Operator Authoring w/ NNC nnc	5	2504	June 7, 2022
About the nnc category nnc	0	557	January 22, 2021
State of PyTorch core: September 2021 edition frontend API	1	9405	September 21, 2021

NNC walkthrough: how PyTorch ops get fused

Related topics