Right, we do not support in-place. TorchScript, in fact, has a pass that replaces in-place ops with their out-of-place equivalents, but it is not run in the default pipeline (because it can easily be a pessimization rather than optimization). NNC itself doesn’t currently support inplace ops either, but we’re considering changing that.
I also wonder why it changes the expression
(a + b) * c
into(a * c) + (b * c)
Hm, I’m surprised by this as well If you’re curious to investigate this, I’d suggest running the test with
PYTORCH_JIT_LOG_LEVEL=">>kernel:>>cuda_codegen"
to see where this happens.
Btw, maybe that would be interesting to you too: there is an API to invoke NNC on a graph directly, without going through the fuser pass (in a case when the graph has some unsupported ops it will fail). It’s not a public API, but maybe you’ll find it convenient for your experiments. Here is an example of how it’s used (and I can provide more if needed):