I am writing about a library built on pytorch (“nnsight”) that my team has been building, that is designed to enable remote science on very large neural networks, and scalable multi-tenant customization. It is called “NNsight” (https://nnsight.net/), and it defines “tracing contexts” that build computation graphs (using fake tensors/metatensors) that can be used to define interventions and customized remote execution of a large model.
In JAX they have just released Penzai GitHub - google-deepmind/penzai: A JAX research toolkit for building, editing, and visualizing neural networks., which attempts to cover similar use-cases.
On NNsight, we also considered a similar API style, but we chose python context managers instead because we think they’re easier to use, allowing more readable complex customizations to be written in ordinary pytorch code. In addition, we think the “remote execution” usecase is important and we have made sure that it is part of the core.
As we have developed and used the system, we have realized that this kind of infrastructure may actually be a pretty important technology and could be a key part of a solution to enable scientists and developers to work with upcoming 500b-class open models (for which it will be exorbitant and complex for people to rent their own machines just to run them). If built correctly, nnsight (or a similar system) could be both strategically important to meta in supporting very large open model use, and it could be good for improving transparency and standardization in the AI ecosystem as a whole.
Right now the team building nnsight is small, but we are energized to get it right. I would love it if we had a bit of direct collaboration with the pytorch (or llama-3) teams at meta, to make sure we’re building it well.
Is there a good way to possibly collaborate with the pytorch team, to get advice or feedback or more even contributions? Who might we want to be talking to?
Do you think pytorch needs something like penzai or nnsight?