Rethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles

xhluca · November 22, 2023, 9:31pm

Hi, I was reading this paragraph:

This includes the ability for the existing FullyShardedDataParallel to expose the original parameters (not FlatParameter s) via use_orig_params=True , which enables flexible support for multiple parameter groups.

I had a hard time understanding what use_orig_params meant. Does it mean that it allows us to include frozen params alongside trainable params? If that’s the case, do you think a better flag (perhaps for future major version) would be allow_frozen_params instead? Otherwise, I might be misunderstanding (please let me know if that’s the case!)

Topic		Replies	Views
TorchDynamo Update 11: Making FSDP and Dynamo Work Together compiler	5	5043	December 27, 2023
DTensor - Status, Design and Looking Forward distributed	3	4231	July 14, 2025
FSDP & CUDACachingAllocator: an outsider newb perspective distributed	10	11664	December 13, 2024
RFC: PyTorch DistributedTensor distributed	6	6939	October 1, 2025
Enabling Float8 All-Gather in FSDP2 distributed	6	4242	August 26, 2024

Rethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles

Related topics