NNC walkthrough: how PyTorch ops get fused

Right, we do not support in-place. TorchScript, in fact, has a pass that replaces in-place ops with their out-of-place equivalents, but it is not run in the default pipeline (because it can easily be a pessimization rather than optimization). NNC itself doesn’t currently support inplace ops either, but we’re considering changing that.

I also wonder why it changes the expression (a + b) * c into (a * c) + (b * c)

Hm, I’m surprised by this as well :slight_smile: If you’re curious to investigate this, I’d suggest running the test with PYTORCH_JIT_LOG_LEVEL=">>kernel:>>cuda_codegen" to see where this happens.

Btw, maybe that would be interesting to you too: there is an API to invoke NNC on a graph directly, without going through the fuser pass (in a case when the graph has some unsupported ops it will fail). It’s not a public API, but maybe you’ll find it convenient for your experiments. Here is an example of how it’s used (and I can provide more if needed):

1 Like