Which papers are relevant to understand how the core of PyTorch and the GPU kernels and torch.compile are put together?

Pawl · May 10, 2025, 12:01pm

Hi, I’m a newbie who is embarking on this project of studying PyTorch holistically. At this point I am mostly interested in the core of PyTorch, the CUDA kernels and the torch.compile project. I am already somewhat familiar with the CUDA kernels as my job involves some hardware aware optimizations.

As a first step I want to take a step back and read some relevant papers to understand how PyTorch is put together, before doing a deep dive in the code.

These are the papers I have found on the topic:

PyTorch: An Imperative Style, High-Performance
Deep Learning Library
PyTorch 2: Faster Machine Learning Through Dynamic
Python Bytecode Transformation and Graph
Compilation
Automatic differentiation in PyTorch

For completeness sake I wanted to ask whether there are other papers or documentation that would be of relevance to read. In particular I cannot find any papers that detail the system design decisions of aten-CUDA. And are there any papers on torch.inductor and torch.dynamo and everything related to torch.compile?

EDIT I have since read the first two papers. I see that the second paper covers torch.inductor and torch.dynamo.

mooglevich · May 12, 2025, 7:45pm

I’m not sure if this is exactly what you want, but this is a well-known blog (by one of the PyTorch devs) that’s quite good PyTorch internals : ezyang’s blog.

He also hosts a great podcast where he explains design decisions and history on various PyTorch libraries etc. https://pytorch-dev-podcast.simplecast.com/episodes

It may not be specifically torch.compile or how PyTorch goes down to CUDA kernels, but maybe it’s helpful.

I haven’t read the papers you’ve listed, but as a PyTorch beginner myself, it’s most welcome to see you share those. I will read them myself now

Update: Oh, and here’s another amazing resource with some PyTorch folks involved https://www.youtube.com/channel/UCJgIbYl6C5no72a0NUAPcTA. Again, it may not 100% be what you’re specifically asking for here, but it’s a great resource for learning.

Pawl · May 13, 2025, 7:46am

Thanks, I found the blog quite helpful. I also found the Wiki on the project page on github contains a lot of information. It is unfortunately all a bit all over the place. I was hoping there would be a more organized documentation of PyTorch internals, but I suspect the way the project evolved that was never possible.

jansel · May 13, 2025, 5:48pm

There is also https://docs.pytorch.org/assets/pytorch2-2.pdf

Pawl · May 13, 2025, 8:36pm

Perfect, thanks, I had have already started studying it.

mozuysal · May 16, 2025, 7:24am

I came across these that might be relevant:

PyTorch2 Presentation slides @ASPLOS’24 on compiler components: https://github.com/pytorch/workshops/blob/master/ASPLOS_2024/README.md
PyTorch OnBoarding Docs: https://github.com/pytorch/pytorch/wiki/Core-Frontend-Onboarding/9989d142ea694d47a3fbeaa728a448c74706e51f

The onboarding docs also have some labs, but the cuda ones seems to be internal-only.

Hope this helps.

Pawl · May 16, 2025, 7:06pm

Thanks I didn’t have those yet. Yeah, I think I can get quite far just reading the source and then I’ll just ask on the forum when I need advice.

Topic		Replies	Views
Developer docs for PyTorch inductor? compiler	3	1357	February 6, 2024
A DeepDive into Dynamo's Implementaion	0	720	May 6, 2024
Call for backward compatability to enable users to understand and adapt to pytorch compiler compiler	2	707	November 18, 2023
Next Steps for PyTorch Compilers compiler	9	10797	October 21, 2021
TorchInductor: a PyTorch-native Compiler with Define-by-Run IR and Symbolic Shapes compiler	46	73394	July 29, 2024

Which papers are relevant to understand how the core of PyTorch and the GPU kernels and torch.compile are put together?

Related topics