What should we do about developer documentation?

We are pretty disorganized with regard to our developer docs (where dev docs are defined as documents covering our workflow or internal technical information—user-facing docs are a separate matter). Surveys of our developers suggest that lack of component specific and system-wide documentation impede productivity.

Currently, our dev docs are splattered around:

  • Internal wikis and posts, which are not visible to open source audiences.
  • Our GitHub wiki, which has no overarching organization, has many rotted links, and is not easily searchable (@ngimel describes them as “write-only”).
  • The codebase itself, which probably works the best currently but is not easily discoverable, searchable, or editable.

Looking around at other large open source projects, a common pattern seems to be to host a developer documentation site that organizes things more comprehensively. Ref:

I think the following properties are valuable for a docs thing:

  • Markdown support
  • Easy to edit, possibly not requiring code editing.
  • But still subject to review? (not sure about this, but arguably our GH wiki is a good example of what happens when you let folks dump docs)
  • Synced with code (possibly: add docs and code atomically, view dev docs at certain hash, etc.)
  • Good search

I guess some questions are:

  • Is it worth trying to find a good way to do the dev docs and consolidate them? Or are we fine with how things are currently?
  • Any thoughts on good candidates? e.g. docs.pytorch.org, GitBook, readthedocs.
8 Likes

GH explicitly states they disabled crawling of wiki pages but point people to github pages instead.

So, why don’t we just migrate our wiki to a GH page?

It’s a good option, probably we could set up Jekyll and just copypasta the current docs there.

I don’t think searching is the only thing we want to improve about the wiki tho, there’s a ton of stale content and no structure. I wonder how easy/hard it is to structure content w Jekyll

I suggest we

  1. Make it searchable
  2. Remove stale pages
  3. Remove stale content
  4. Decide on what we want to cover
  5. Cover it
2 Likes

But still subject to review?
I think optional review. Github repo with wide committer set might be ok for it.

We can start with setting up github pages. And if we actually get it to a polished state - exposing it under pytorch.org in the future maybe

Content is more critical than format as long as it’s searchable. Definitely makes sense to do a pass on the existing wiki. It might be also worth to do an attempt on describing architecture generally (similarly to the existing docs to the JIT part) and pull out parts of various dev discuss posts that are relevant.

3 Likes

FYI, that the github wiki is a git repo. While it doesn’t directly allow PRs, you can create a separate repo to “edit the wiki” + review, and just mirror that always to the wiki.

Python or Rust’s dev docs are pretty nice.
Python’s uses Sphinx, and I believe it doesn’t have a WYSIWYG editor, like @suo asked as a nice thing to have in the first post, but VSCode supports Markdown editing with live previews, if that’s acceptable.

@dzhulgakov and I have been playing with GitBook, which seems quite nice:

  • Quip-ish WYSIWYG editor within inline diff, comments, etc.
  • Bi-directional Git sync
  • Everything is Markdown
  • Free for open source projects
  • Indexed by search engines and stuff

In look and feel it is very similar to Rust’s mdBook (I have seen mdBook described explicitly as a GitBook clone somewhere). Example book from us messing around: About this guide - PyTorch Developer Documentation

Edit: oh, I saw that in the mdbook project description lol
image

1 Like

MKDocs is a great simple way to manage docs in Markdown. MKDocs material includes pretty good search and support for diagrams and mathjax. Getting started - Material for MkDocs

3 Likes

Hey all, I’m new here and looking to learn and contribute to PyTorch! Has this project already been spinned up since January? If not, let’s resume @suo?

hey! @b0noI might be the best person to summarize our current efforts around developer documentation and ways to help out.

awesome @suo! @b0noI - can you point me in the right direction as to where/how to start and help from here? Thanks!

Hi!

I am thinking of adding a page on the wiki which somewhat achieves a starter version of developer documentation (issue raised here(Adding a page for subfolder/subfile overview/descriptions in the developer wiki · Issue #91842 · pytorch/pytorch · GitHub) ) and thought that this would be of interest to this thread! If I could get permission to add a new page to the wiki, I can get started on documentation of intention/purpose of various subfolders/subfiles etc. in the main repo

Hey @suo is there currently any platform for viewing this documentation?

otherwise, I am totally looking forward to this !!

I would also add the “Developer Notes” section of the main docs website that contain pages like Autograd mechanics — PyTorch 2.2 documentation that would be the closest to developer long-form docs we have.
It does also contain a bit of everything for sure.

Another very low tech option that crosses all your asks above is a good old github-view based markdown doc: nn/doc/index.md at master · torch/nn · GitHub that you expect google to index properly for search.

1 Like

Hey @albanD I have a suggestion to make. Will it be possible to use a service/tool like source code mapper to generate source code maps for all or some parts of the PyTorch project (my interest lies in the _dynamo and _inductor directories under torch)? It would make entry to the ecosystem for new developers very smooth!

There are two ways to use the tool:

  1. I create a fork of PyTorch on my github acc. and setup the tool
  2. PyTorch officially sets it up on the official repo so that anyone in the community can directly see that

I was hoping that the second approach would be better!

Found a relevant link: opensource.guide that lays down some thoughts on the documentation flow!