Testing in PrivateUse1 for out-of-tree PyTorch Backends

Testing in PrivateUse1 and out of core backend has been quickly evolving and so I wanted to write down a quick update here on where we are today and where we expect to go short term.

See the new User Guide for details on what is happening here: Accelerator Integration — PyTorch 2.9 documentation

  1. Make running the right test easy

Generally we expect 2 types of tests here:

  • Ensure all ops are properly implemented. By re-using OpInfo. Accessing the opinfo db from torch.testing._internal.common_methods_invocations. These can be used to validate that cpu vs accelerator return the same result for a broad range of input. Or more generally anything that requires running PyTorch Ops.
  • Ensure PyTorch modules behave as expected via “device-generic” tests. For this, you can re-use the in-core test classes and run them in your CI. Very rough steps to do that are:
Add you device test base class and register it:
from torch.testing._internal.common_device_type import device_type_test_bases
from torch.testing._internal.common_device_type import DeviceTypeTestBase

class TestBaseYours(DeviceTypeTestBase):
    # Some stuff
    pass

device_type_test_bases.append(YourTestBase)


Import the module containing the test class you are migrating. This module should invoke instantiate_device_type_tests on the test class, generating your specific versions.
Expose the generated test classes to the current module’s globals in order for test discovery to find them. Set the module for the test classes to be the current module. This step depends a lot on how you run your test and how much discovery is needed.

import test_tensor_creation_ops

TestTensorCreationYours = test_tensor_creation_ops.TestTensorCreationYours
TestTensorCreationYours.__module__ = __name__

You can then skip/xfail there:
# cat empty fatally crashes, so skip it
TestTensorCreationYours.test_cat_empty_yours.__unittest_skip__ = True

# kaiser_window is expected to fail on Yours device
TestTensorCreationYours.test_kaiser_window_yours_float64.__unittest_expecting_failure__ = True

The run that file like your usual tests.
  1. Have a unified “standard test suite” and validation

This is unfortunately not done yet. But a topic we are interested in talking more about.

  1. Generic testing of the extension point

OpenReg has been evolving quickly to be able to test things in core.

This is continuously growing to validate all extension point from in-core directly.

  1. CI/CD pipelines

We also want to enable projects to be able to run test and provide feedback. This is under heavy discussion still, but the main phases we expect to see are:

  • Early: you run CI at your preferred cadence on your repo (pinned to PyTorch release, only for changes in your repo for example)
  • Ramping up: Start triggering your CI for new changes happening in core. In this world, commits in core would trigger your CI through webhook. You can use this to ensure your project keeps working with PyTorch nightly/trunk.
  • Stabilizing: In this phase, you would be triggering and running on all trunk commits (including just before PR merge), with signal from the run being sent back to the core monitoring infrastructure. Being visible in https://hud.pytorch.org/. The goal is to capture metrics on job stability and coverage.
  • Stable: Same as stabilizing but the job is blocking viable/strict (meaning it is also blocking for PRs merge/nightly/release)

The different criterion and requirements to move between these levels are still under discussion and we’ll refine them as we onboard more users here.

2 Likes

I think it would make sense to specifically add DeviceTypeTestBase example as part of OpenReg as it would provide a good foundation for testing an out-of-tree Pytorch backend

I 100% agree with that, and I expect a working example of OpInfo-based test as well.

These have two benefits:

  • Provide good working example as documentation on how to do it.
  • Ensure we don’t change any of these APIs (most of which are private) going forward.

cc @fffrog

I think it would make sense to specifically add DeviceTypeTestBase example as part of OpenReg as it would provide a good foundation for testing an out-of-tree Pytorch backend

@JRosenkranz Thank you a lot, it`s a very good advices.

I 100% agree with that, and I expect a working example of OpInfo-based test as well.

@albanD Thank you for your valuable suggestions. Maintaining the ease of use and stability of the testing suite is also very important for the new backend. I will carefully consider how to implement the relevant functions in OpenReg to serve as a reference and for BC guard.

1 Like

@albanD When you are referring to a unified standard test suite, would this be defined by what is the acceptable min set of tests for an out-of-tree device to be considered “stable”. For instance, some devices may support some dtypes, ops, etc. that others don’t, however we don’t want to include them as part of this standard because they may not be as important to the general case.

This is a great question!
There is a reason why this one is still very open ended and why we haven’t made much progress there yet.

This is mainly because it spans a very wide breadth of topics, including, but not limited to:

  • What is considered “close enough” numerical error for element-wise, reduction ops, etc
  • What is considered “good coverage” out of the 3k+ ops in PyTorch, dtypes, broadcasting, etc
  • What is considered “good support of components” based on the device generic tests (about autograd, distributed, nn, optim, foreach, sparse, serialization, etc)
  • What is considered “good support in community” based on third party repos (vllm, deepspeed, ray I guess but also hf/transformer, lightning, ao, bnb, safetensors, etc)
  • How are these things going to be checked
  • Who is going to be an independent third party to validate these results
  • How do we present to the end user the result of this for them to have a good understanding on the current state of a given backend.

I do expect it is going to take some time and concerted efforts to figure this out.

2 Likes

@albanD I’d be very happy to collaborate on figuring out/improving testing for out-of-tree backends. Just right now I’m building a PyTorch backend for for WebGPU and this was exactly the question I needed to answer for myself. I haven’t done any better than add some basic unit tests while implementing each aten op, more like sanity checks, so probably I could test our ideas for testing on this project. Pls let me know if I could help with that

1 Like