Hi, I’m a newbie who is embarking on this project of studying PyTorch holistically. At this point I am mostly interested in the core of PyTorch, the CUDA kernels and the torch.compile project. I am already somewhat familiar with the CUDA kernels as my job involves some hardware aware optimizations.
As a first step I want to take a step back and read some relevant papers to understand how PyTorch is put together, before doing a deep dive in the code.
These are the papers I have found on the topic:
PyTorch: An Imperative Style, High-Performance
Deep Learning Library
PyTorch 2: Faster Machine Learning Through Dynamic
Python Bytecode Transformation and Graph
Compilation
Automatic differentiation in PyTorch
For completeness sake I wanted to ask whether there are other papers or documentation that would be of relevance to read. In particular I cannot find any papers that detail the system design decisions of aten-CUDA. And are there any papers on torch.inductor and torch.dynamo and everything related to torch.compile?
EDIT I have since read the first two papers. I see that the second paper covers torch.inductor and torch.dynamo.
I’m not sure if this is exactly what you want, but this is a well-known blog (by one of the PyTorch devs) that’s quite good PyTorch internals : ezyang’s blog.
It may not be specifically torch.compile or how PyTorch goes down to CUDA kernels, but maybe it’s helpful.
I haven’t read the papers you’ve listed, but as a PyTorch beginner myself, it’s most welcome to see you share those. I will read them myself now
Update: Oh, and here’s another amazing resource with some PyTorch folks involved https://www.youtube.com/channel/UCJgIbYl6C5no72a0NUAPcTA. Again, it may not 100% be what you’re specifically asking for here, but it’s a great resource for learning.
Thanks, I found the blog quite helpful. I also found the Wiki on the project page on github contains a lot of information. It is unfortunately all a bit all over the place. I was hoping there would be a more organized documentation of PyTorch internals, but I suspect the way the project evolved that was never possible.