State of PT2 OSS Issues: Q1 2024

Criteria of success

  1. 90% of PT2 high-priority issues closed (H1 roadmap KR)
  2. 75% of PT2 bug issues are closed (H1 BE KR)
  3. “Unclassified” open PT2 issues
  • 0% “unclassified” open PT2 high-priority issues opened more than a month old
  • <1% “Unclassified” PT2 open issues
  1. 90% of PT2 issues that were opened more than a year ago closed

System-level improvements

In Q1, we built the following structures to monitor the health of PT2 OSS issue queues and distribute them among developers.

  • PT2 issue labeling and issue queue health ownership

    • Revamped PT2 OSS issue labeling system (Richard Zou)
    • Classified issues via labels that map to foundational and holistic components
    • Established (Meta internal Redirecting...) component STOs as owners of issue queue health by labels
  • Established old PT2 issue scrubbing process: at branch-cut a new release, we start a sweep through PT2 issues that were opened two release numbers before

    • Pre-2.0 issues (William Wen, Yidi Wu): closed 160+ stale issues, now 98 (open) /654(closed)
  • Pre-2.1 issues (Michael Lazos, Tugsbayasgalan (Tugsuu) Manlaibaatar): closed 70+ stale issues out of 218 when started the effort.

  • Defined metrics (Criteria of Success above) that we can meaningfully move

    • Considerations: focus on high-priority issues and small/medium bugs (excluding features, enhancements, and bugs requiring big infra changes).
  • Leveraged weekly PT2 triage meeting for accountability (high-priority) and metrics awareness.

Next steps

The goal is to reach our “Criteria of Success” by the end of H1 with an established structure to sustain PT2 OSS issue queue health.

For Q2, we will focus on the following areas as the next steps.

  1. Component/holistic areas shall establish a working bug distribution mechanism and set an expectation of bug fixes (from the issue queue) for every developer on the pod. The Inductor component did a really good job closing high-priority issues.
  2. Triage all unclassified issues to the component or holistic area level. (close to done)
  3. Better classification of bug issues. The current classification of bug issues is too coarse-grained. There are 700 open PT2 bug issues today. Many should probably be labeled “feature” or “enhancement.” We may further reduce the bug issue pool by differentiating bugs that would require big infra change (“month”) and those we do not plan to fix in 6 months (“low-priority”).

We would also welcome community contributors to take issues marked as “high priority,” “bug,” or “first good issues.”

State of PT2 OSS issues (Apr 10, 2024)

Summary

  • High priority issues: 39/134 (open/close), 22.5% open
  • Bug issues: 650/1583 (open/close), 29% open
  • “Unclassified”
    • Unclassified high-priority PT2 open issues: 5
    • Unclassified PT2 open issues: 19/670 (2.7%)
  • More than one year old PT2 issues: 112/727 (13.4%)

High-priority issues

Definition (Internal Login):

  • has “oncall: pt2” and “high priority”
  • excludes “needs reproduction”, “months”, “module: rocm”

Stats

Bug issues

Definition (Internal Login):

  • has “oncall: pt2”
  • excludes “module: rocm”, “skipped”, “oncall: cpu inductor”, “module: windows”, “module: wsl”, “topic: fuzzer”, “module: minifier”, “module: onnx”, “module: xla”
  • excludes “needs reproduction”,
  • excludes “feature”, “module: performance”, “enhancement”, “month”

Stats

2 Likes