Problems with torch.compile generated code in tutorial

Hello!

I decided to start the new year by diving into the intricacies of PyTorch 2.0 :slight_smile:

I’m trying to reproduce the example from the tutorial Accelerating Hugging Face and TIMM models and code generation is different in my case from what is given in the tutorial. As I understand, the Triton code was supposed to use 1 load, in my case there are still 2 loads. I would appreciate your help.

Fyi, I tried to reproduce the code in docker with the image ghcr.io/pytorch/pytorch-nightly:2.0.0.dev20230301-devel

The generated code during reproducing can be viewed here - pytorch_experiments/torch_compile_first_test/torch_compile_debug/run_2024_01_02_14_44_21_028356-pid_9378/aot_torchinductor/model__0_inference_0.0/output_code.py at master · azsh1725/pytorch_experiments · GitHub

Another strange thing in the tutorial, I’m trying to reproduce “a real model” example with the TORCH_COMPILE_DEBUG=1 flag and I don’t see any logs for converting this model and also the aot_torchinductor directory is not created, is this how it should be?

The point of fusion is that every tensor is loaded just once and fused intermediary tensors are not stored/loaded to/from global memory. That is exactly what happens in that example. You are going to need two loads as you have two tensors and you need to read the data of each of them!

That blogpost has an errata. It should read “we can turn 4 reads and 3 writes into 2 reads and 1 write”

In the future, please post these questions in https://discuss.pytorch.org/.

1 Like

Thank you very much for the clarification and answer!