New Contributor Looking for Mentorship!

Hi everyone,

I’m Emmett, an eighteen-year-old who’s passionate about contributing to PyTorch! I’ve been self-teaching myself machine learning for the past three years and I took a gap year before entering college to do independent research (this is my most recent ongoing project). I want to build my skills as a machine learning developer and I received advice that contributing to open source projects would be the best method to do so. I very frequently use PyTorch so it seemed like an obvious choice to contribute.

I would like to spend the upcoming months contributing to PyTorch full-time to build my skills and would really appreciate guidance/mentorship :).

So far I’ve made these two prs:

And I’m interested in pretty much all tasks, especially more involved ones that require me to understand PyTorch more in depth.

I’m especially interested in working on the following issues:

Hi! Nice to see this after engaging with you in recent discussions like Optimizers' `differentiable` flag doesn't work · Issue #141832 · pytorch/pytorch · GitHub! Since you’re interested in contributing more than just piecewise, I’d be excited to work with you in tackling a more holistic medium-sized project around differentiable optimizers if you’re interested.

You’ve already gotten some context around action items, so this is to set a clearer goal to frame the bigger picture. Ultimately, we want to see differentiable optimizers have better support, test coverage, and documentation that they do today.

What would an ideal end state look like?
(1) Better support: people can run differentiable optimizers with lr, betas, and weight_decay as Tensors that require grad, meaning people can train their optimizer hyper params.

(2) Better documentation: we have a tutorial in GitHub - pytorch/tutorials: PyTorch tutorials. showing a real use case for differentiable optimizers, and our pytorch/pytorch documentation has a simpler code example. We also raise proper errors/warnings within the code linking to these resources.

(3) Fuller test coverage: our differentiable tests were excluded from our general test infrastructure migration to OptimizerInfo, but ideally, we’d use OptimizerInfos for these tests as well. An example test case that we should have our differentiable tests look like could be found in pytorch/test/test_optim.py at main · pytorch/pytorch · GitHub, see test_foreach_large_tensor. We’d want to use the OptimizerInfo infra to encompass all the new tests we want to add, like lr as a Tensor, etc.

Like most destinations, this end state can be achieve from several directions, and here’s a sample path taken from what we already delineated in the linked issue above:
Step 0 (could be done in parallel or first, based on preference): Migrate the current differentiable tests pytorch/test/optim/test_optim.py at main · pytorch/pytorch · GitHub to use OptimizerInfos + expand test coverage.
Step 1: support tensor LR when differentiable is True for SGD. Add a test case and docs in the code.
Step 2: now what if the tensor LR requires grad? Make sure this works and add a test case and docs in the code.
Step 3: Expand the above to different optimizers, Adam, AdamW, Adagrad, etc. Of course, add test cases and corresponding docs. This might be when it’d be good to consider using OptimizerInfos if you haven’t yet.
Step 4: Add error messaging.
Step 5: Add overarching docs on how to use differentiable optimizers and what’s supported. I could also see this being step 1, with gradual improvements as steps 1-3 are completed.

Let me know what you think!

This sounds wonderful and I would absolutely love to work with you on this project! Working on broader differentiable optimizer supports seems really meaningful and exciting. I would love to start working on step 0 before starting step 1, instead of working on both in parallel, because I haven’t used OptimizerInfo and I think I’ll have a bit to learn.

Thank you so much for this opportunity! I’ll start tomorrow after I tie up loose ends with other PyTorch PRs. Do you have a preferred method of communication for us to use?

Cool! I’ve sent you a message (I think? I haven’t sent messages on dev discuss before lol) for further communication.

By the way, when I meant in parallel, I just meant it could be done any time, not necessarily at the same time. For example, feel free to do step 1 independently, and then look at step 0. I find that it is easiest to start with something you’re already a bit familiar with, and there’s quite a lot of flexibility here in where you want to start.

Ok that makes sense! And I have received your message.