New Contributor Looking for Mentorship!

janeyx99 · December 9, 2024, 9:12pm

Hi! Nice to see this after engaging with you in recent discussions like Optimizers' `differentiable` flag doesn't work · Issue #141832 · pytorch/pytorch · GitHub! Since you’re interested in contributing more than just piecewise, I’d be excited to work with you in tackling a more holistic medium-sized project around differentiable optimizers if you’re interested.

You’ve already gotten some context around action items, so this is to set a clearer goal to frame the bigger picture. Ultimately, we want to see differentiable optimizers have better support, test coverage, and documentation that they do today.

What would an ideal end state look like?
(1) Better support: people can run differentiable optimizers with lr, betas, and weight_decay as Tensors that require grad, meaning people can train their optimizer hyper params.

(2) Better documentation: we have a tutorial in GitHub - pytorch/tutorials: PyTorch tutorials. showing a real use case for differentiable optimizers, and our pytorch/pytorch documentation has a simpler code example. We also raise proper errors/warnings within the code linking to these resources.

(3) Fuller test coverage: our differentiable tests were excluded from our general test infrastructure migration to OptimizerInfo, but ideally, we’d use OptimizerInfos for these tests as well. An example test case that we should have our differentiable tests look like could be found in pytorch/test/test_optim.py at main · pytorch/pytorch · GitHub, see test_foreach_large_tensor. We’d want to use the OptimizerInfo infra to encompass all the new tests we want to add, like lr as a Tensor, etc.

Like most destinations, this end state can be achieve from several directions, and here’s a sample path taken from what we already delineated in the linked issue above:
Step 0 (could be done in parallel or first, based on preference): Migrate the current differentiable tests pytorch/test/optim/test_optim.py at main · pytorch/pytorch · GitHub to use OptimizerInfos + expand test coverage.
Step 1: support tensor LR when differentiable is True for SGD. Add a test case and docs in the code.
Step 2: now what if the tensor LR requires grad? Make sure this works and add a test case and docs in the code.
Step 3: Expand the above to different optimizers, Adam, AdamW, Adagrad, etc. Of course, add test cases and corresponding docs. This might be when it’d be good to consider using OptimizerInfos if you haven’t yet.
Step 4: Add error messaging.
Step 5: Add overarching docs on how to use differentiable optimizers and what’s supported. I could also see this being step 1, with gradual improvements as steps 1-3 are completed.

Let me know what you think!

Topic		Replies	Views
Speed with the codebase in pytorch	2	534	September 22, 2023
Newcomer Contributor	4	405	December 4, 2024
Need help to get started	1	871	August 10, 2021
Contributing as a beginner	1	101	May 16, 2025
Tracing with Primitives: Update 2 compiler	4	6982	January 13, 2023

New Contributor Looking for Mentorship!

Related topics