NNC walkthrough: how PyTorch ops get fused

nunoplopes · October 27, 2021, 9:50am

FWIW, this is the test driver I’m using to test the different fusers:

  elif arg == '--fuser-nnc':
    torch._C._jit_override_can_fuse_on_cpu(True)
    torch._C._jit_override_can_fuse_on_gpu(True)
    torch._C._jit_set_texpr_parallel_cpu_enabled(True)
    torch._C._jit_set_te_must_use_llvm_cpu(False)
    os.environ['PYTORCH_TENSOREXPR_DONT_USE_LLVM'] = '1'
  elif arg == '--fuser-nnc-llvm':
    torch._C._jit_override_can_fuse_on_cpu(True)
    torch._C._jit_override_can_fuse_on_gpu(True)
    torch._C._jit_set_texpr_parallel_cpu_enabled(True)
  elif arg == '--nvfuser':
    #os.environ['PYTORCH_CUDA_FUSER_DISABLE_FMA'] = '1'
    torch._C._jit_override_can_fuse_on_cpu(False)
    torch._C._jit_override_can_fuse_on_gpu(False)
    torch._C._jit_set_texpr_fuser_enabled(False)
    torch._C._jit_set_nvfuser_enabled(True)

not seeing great results so far to be honest.

Topic		Replies	Views
Python Operator Authoring w/ NNC nnc	5	2446	June 7, 2022
TorchInductor: a PyTorch-native Compiler with Define-by-Run IR and Symbolic Shapes compiler	46	65050	July 29, 2024
NNC Per-Operator Benchmarks (on CPU) nnc	5	1012	January 27, 2021
TorchDynamo Update 3: GPU Inference Edition compiler	12	6631	February 2, 2023
Tracing with Primitives: Update 2 compiler	4	6902	January 13, 2023

NNC walkthrough: how PyTorch ops get fused

Related topics