TorchDynamo Update 11: Making FSDP and Dynamo Work Together

mollon650 · December 27, 2023, 6:00am

1.you means： make nn.module parameters are inputs to the graph rather than attributes of the outer graphmodule that can make autograd call the backward hook？or just make FSDP run correctly？
2.the problem of overlap betweent bw compute and reduce gradients is workaround by the “UnspecializedNNModuleVariable”，how the UnspecializedNNModuleVariable resolve the problem？where can find the reference about the “UnspecializedNNModuleVariable”?
3. can lazy tensor core trace collective communicate ops？I found pytorch/xla can trace these ops ，it implement by custom plugin，is it the right way to compile the model with collective ops？
Thanks

Topic		Replies	Views
Rethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles distributed	19	13070	September 17, 2024
Torch.compile() + FSDP - Dec 8th compiler	2	3649	March 8, 2024
TorchDynamo Update 9: Making DDP Work with TorchDynamo compiler	8	15006	November 27, 2023
State of PyTorch core: September 2021 edition frontend API	1	9528	September 21, 2021
TorchDynamo Update 6: Training support with AOTAutograd compiler	0	5804	March 29, 2022

TorchDynamo Update 11: Making FSDP and Dynamo Work Together

Related topics