Apologies that these are a bit delayed.
The core maintainers meet quarterly and generally approve the minutes from meeting N at the top of meeting N+1 so these should have been posted after the Dec 13 meeting. For no particularly good reason I am just getting around to posting them now.
Note that there were three specific questions asked using the form so I’ve pulled the responses to those questions to the top. These responses were written by core maintainers.
Specific responses to each of the three questions:
Mingfei Ma topic: “LLM inference challenge: bare metal solutions such as ggerganov/llama.cpp gave very good portability on various devices (and potentially it is capable of achieving optimal performance), also the project has pretty cool int4 implementation. Would like to understand how pytorch is going to handle this situation”
Response: PyTorch export() will provide a canonical representation of a full graph model. We are thinking about creating a demo or tutorial that shows how to create a lightweight interpreter using c that combines that representation with the operators c implementation. We believe the performance of this solution will be similar to llama.cpp.
Jiong Gong topic: “I’d like to learn the plan of stabilizing the ATen/Prims IR specification, how the backward compatibility is going to be maintained and what is the philosophy to add new ops to it.”
Response: Part of PyTorch export() will be a canonical set of 150-200 aten operators which other aten ops can be decomposed into. We will maintain backwards compatibility on this op-set to enable various back ends.
Jiong Gong second topic: “I’d like to learn the roadmap/plan supporting the export of a complete graph for models, in particular, those LLMs having dynamic control flow (i.e., decoder loop).”
Response: Complete graph support will be provided within torch.export() this will be announced at PyTorch Conference in Oct 2023. Torch.export() will have some limited support for control flow through explicit control flow functions (torch.cond is available and torch.scan is on the roadmap), but we’re also looking into having the exported / generated code able to override parts of graph and call custom functions.
Redacted notes taken during Aug 18th meeting:
Approved minutes from last meeting.
Three tech topics submitted through the form:
Mingfei Ma topic: “LLM inference challenge: bare metal solutions such as ggerganov/llama.cpp gave very good portability on various devices (and potentially it is capable of achieving optimal performance), also the project has pretty cool int4 implementation. Would like to understand how pytorch is going to handle this situation”
Jiong Gong topic: “I’d like to learn the plan of stabilizing the ATen/Prims IR specification, how the backward compatibility is going to be maintained and what is the philosophy to add new ops to it.”
Jiong Gong second topic: “I’d like to learn the roadmap/plan supporting the export of a complete graph for models, in particular, those LLMs having dynamic control flow (i.e., decoder loop).”
Discussion:
Soumith :
All three of these might be addressed with export()
Will be announced at XLA event in a few weeks and then amplified at the PyTorch Conference
Can we create a hacky back-end that users export() and then makes a lightweight runtime
Collaboration with Horace
That would give a good answer for A and B
For item B the other thing to note is that we have a ~150 canonical aten op
Greg:
Is this a marketing thing … because we don’t really plan on productizing this
Soumith
Something more hacky is okay
Dima :
Do you want to advertise AOTinductor
Soumith:
No hackier
Dima: what is the perf like
Soumith: the perf looks pretty competitive with llama.cpp
Edward: right now all the messaging on this is very internally focused
Do we want to make this really a good OSS project? If so that will take more effort
Greg: as demonstrations go it sounds good. “Just write it yourself” might be an ok response
Edward: on the third point.
We basically don’t have a plan.
Greg: What do you mean?
Ed: We are going to have some control flow ops
We don’t have any story for anything complex… If you have the simplest possible thing it can work… But not a clear story to get to more complex.
Ed: we shouldn’t really have a plan for this. But how do we message that
Greg: Should we try to guide people who have more complex stories … do we want to encourage them to use compile() and Python in production.
Soumith: Isn’t Dima’s team just announcing an inference platform based on Python in service.
Dima: Yes… and we are probably going to do some sort of a blog post on this.
No maintainer or core maintainer nominations received.
Updates from Soumith:
[redacted discussion about a potential core maintainer]
LF has a PyTorch documentary in progress.
Request is: if you can think of anyone who should be interviewed please add them to the chat.
Challenges: there are two main ones Distributed and Compilation
Distributed has gotten to better health. Will Constable is now leading that team. They are making more deliberate progress on things like how do we trace a comms operator and can we build better higher level constructs. We also have unblocked their access to large sets of GPUs.
Torch data is deprecated … it was going the wrong direction and we lost personnel.
Soumith is working with Kartikay and some of the data loader folks.
Also got a ping from the MLcommons folks that perhaps they might want to make a common dataloader
Inspiration: MosaicML’s streaming data loader.
Multimodal seems to be picking up traction in the community… Soumith is working with Kartikay and Laurence to package things together into a mono repo to provide more interoperability.
The main pain point … people are building multimodal large models … with all different kinds of tokens (text, image, etc… )
Need to be able to embed all of these using a unified preproc and postproc
Video and image and audio is semi-interoperable… Text folks do their own things.
We need to think about how to have a multi-modality data loader.
Moto is looking at the IO components.
First step is putting all of this into the same packaging.
Greg: Update on FP8
Trying to make FP8 work better.
Dtype and bindings have landed.
We are working with NVIDIA on PT2 compatible transformer engine variant.
They will keep their own things for backwards compatibility
Soumith:
We have a 60 minute keynote for the main product announcement.
Ian has an initial draft
Storyboard and who is talking and about what we do in the follow up sessions.
Action Item: Greg + Chris to make sure that is all coherent.
Chris: Just a quick heads up that we want to do PyTorch community awards at the conference and I will be reaching out to this team to help with nominations and/or selection. Details still TBD.
Soumith:
Lucy and Ibrahim mentioned that there will be a little more budget next year.
Soumith wants us to consider: how do we empower a bunch of social media content creation? Can we improve on the process that we have now Jen and Kylee where we push things to them and then they promote.
Soumith to follow up with Lucy and Jen.
Ed: Not sure how I would solve this problem
Dima: what is it that linux foundation wants … do we just want product announcements or do we want to retweet really cool stuff.
Soumith: Strategic stuff… Consciously address gaps … and amplify solutions to market perception gaps. Choose topics that are important and make sure we have some content
Back on topic 2:
Ed: There is some discussion going on about training support in the canonical Aten OpSet.
It would support the workflow of: Export a model and then train that model.