Internals of Deferred Module Initialization

cbalioglu · August 23, 2022, 8:25am

Lately I have been getting a lot of questions from several folks about how fake tensors and deferred module initialization work. Most people find them a bit “too magical” and wonder how we implemented them. Since we now also have some major external projects, such as PyTorch Lightning, depending on deferred module initialization, I decided to write down the internal mechanics of both features and published a “design notes” section in the public torchdistX website. In case you have had similar questions, I encourage you to check out the documentation. Of course I would also appreciate any feedback.

Cheers!

carmocca · August 23, 2022, 11:19am

Some naive questions:

Are there any advantages to meta tensors over fake tensors?
Would fake tensors ever “replace” meta tensors? Or could they both be unified at some point?
Is there any relationship to lazy tensors?

Topic		Replies	Views
State of model creation/initialization/seralization in PyTorch Core	1	1367	May 2, 2023
A DeepDive into Dynamo's Implementaion	0	656	May 6, 2024
Lazy Tensor Core hardware-backends	20	7464	July 12, 2022
Where to Post for Tensor Subclass Support?	2	114	March 3, 2025
RFC: Skip Module Parameter Initialization frontend API	0	861	May 5, 2021

Internals of Deferred Module Initialization

Related topics