Problems with torch.compile generated code in tutorial

azik1725 · January 2, 2024, 12:44pm

Hello!

I decided to start the new year by diving into the intricacies of PyTorch 2.0

I’m trying to reproduce the example from the tutorial Accelerating Hugging Face and TIMM models and code generation is different in my case from what is given in the tutorial. As I understand, the Triton code was supposed to use 1 load, in my case there are still 2 loads. I would appreciate your help.

Fyi, I tried to reproduce the code in docker with the image ghcr.io/pytorch/pytorch-nightly:2.0.0.dev20230301-devel

The generated code during reproducing can be viewed here - pytorch_experiments/torch_compile_first_test/torch_compile_debug/run_2024_01_02_14_44_21_028356-pid_9378/aot_torchinductor/model__0_inference_0.0/output_code.py at master · azsh1725/pytorch_experiments · GitHub

azik1725 · January 2, 2024, 2:01pm

Another strange thing in the tutorial, I’m trying to reproduce “a real model” example with the TORCH_COMPILE_DEBUG=1 flag and I don’t see any logs for converting this model and also the aot_torchinductor directory is not created, is this how it should be?

Lezcano · January 2, 2024, 3:38pm

The point of fusion is that every tensor is loaded just once and fused intermediary tensors are not stored/loaded to/from global memory. That is exactly what happens in that example. You are going to need two loads as you have two tensors and you need to read the data of each of them!

That blogpost has an errata. It should read “we can turn 4 reads and 3 writes into 2 reads and 1 write”

In the future, please post these questions in https://discuss.pytorch.org/.

azik1725 · January 2, 2024, 8:05pm

Thank you very much for the clarification and answer!

Topic		Replies	Views
How can I dump the prims IR, triton code, and ptx code when using torch.compile() compiler	2	1198	May 6, 2024
Performance Comparison between Torch.Compile and APEX optimizers compiler	1	2033	May 1, 2024
Is new torch.compile() able to compile training model? compiler	1	639	April 25, 2023
Debugging story: The case of the garbage text generation compiler	0	2029	March 30, 2023
TorchInductor: a PyTorch-native Compiler with Define-by-Run IR and Symbolic Shapes compiler	46	67373	July 29, 2024

Problems with torch.compile generated code in tutorial

Related topics