Rethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles

Hi, I was reading this paragraph:

  • This includes the ability for the existing FullyShardedDataParallel to expose the original parameters (not FlatParameter s) via use_orig_params=True , which enables flexible support for multiple parameter groups.

I had a hard time understanding what use_orig_params meant. Does it mean that it allows us to include frozen params alongside trainable params? If that’s the case, do you think a better flag (perhaps for future major version) would be allow_frozen_params instead? Otherwise, I might be misunderstanding (please let me know if that’s the case!)