How to keep up with updates of operator signatures

I recently started working agaist nighly build of pytorch and found out that signatures of two functions (that were implemented) had changed:

-    // {"schema": "aten::mean.out(Tensor self, int[1] dim, bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)", "dispatch": "True", "default": "False"}
-    Tensor & mean_out(const Tensor & self, IntArrayRef dim, bool keepdim, c10::optional<ScalarType> dtype, Tensor & out)
+    // {"schema": "aten::mean.out(Tensor self, int[1]? dim, bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)", "dispatch": "True", "default": "False"}
+    Tensor & mean_out(const Tensor & self, OptionalIntArrayRef dim, bool keepdim, c10::optional<ScalarType> dtype, Tensor & out)

... 
   
-    // {"schema": "aten::sum.IntList_out(Tensor self, int[1] dim, bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)", "dispatch": "True", "default": "False"}
-    Tensor & sum_out(const Tensor & self, IntArrayRef dim, bool keepdim, c10::optional<ScalarType> dtype, Tensor & out)
+    // {"schema": "aten::sum.IntList_out(Tensor self, int[1]? dim, bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!)", "dispatch": "True", "default": "False"}
+    Tensor & sum_out(const Tensor & self, OptionalIntArrayRef dim, bool keepdim, c10::optional<ScalarType> dtype, Tensor & out)

Basially IntArrayRef was replaced with OptionalIntArrayRef

Now I understand that I can’t expect that the signatures will be static. Yet I have several questions:

  1. How can I get alerts in advance that some of existing function signatures had changed so I can be prepared instead of getting bug reports that something does not work
  2. How should I handle backend build for different torch versions, since once user may have 1.14 and another 1.16. Since it is out-of-tree backend I probably need to support at least several versions of pytorch since I can’t expect everybody to use the same latest version. How do you recommend doing it?
  3. How frequent are modifications of existent signatures and how much will I need to run after changes and torch and update the functions all the time?

Thanks!

Artyom Beilis

Anybody?

I currently lined up my OpenCL backend against 1.13 and it works agaist nightly as well.

However what is best to handle future updates/changes unless I limit the user to absolutely latest version of pytorch?

How is it handled in other out of tree backends?

Alas, there’s no current good way to automatically do this. @ezyang put out a proposal for how to handle this a long time ago, but was never implemented.

The only proxy for this is to have a look at the list in pytorch/check_forward_backward_compatibility.py at master · pytorch/pytorch · GitHub. This list is updated with a new entry whenever a operation modifies the signature of a function in native_functions.yaml. I guess the older entries are also cleaned from time to time.

That file has to be modified indeed when the c++ API changes. And so this is a good thing to look out for.

These days, with the work related to Symbolic Shape (you have detailed updates at State of symbolic shapes branch - #9 by ezyang), we are actually updating a lot of these signatures to support SymInt.
We ensured that users don’t see the change and that this is not BC-breaking for them. But if you do things like string matching of the signatures, that might be a challenge.

I’m afraid there isn’t a great answer to the binary compatibility question though. The way it works with libs like torchvision that have similar issue is that nightly PT works with nightly torchvision and we release new versions at the same time.
We also pin the torchvision release (the the pip package dependency) to a very specific PT release so that if the user asks for a given version of torchvision, they will get a working version of PT alongside it.
And if a user wants a specific version of PT that is not a release, then they also have to build torchvision from source.

I think you could follow a similar process, there are messages in the #annoucement channel on slack when RC are happening for a new release. That should give you time to prepare a binary as well to release alongside the PT one.

And how do you deal with it in out-of-tree backends? Or you keep them pinned to PT release as well?

I’ll probably need to add some ifdefs to make sure I can compile the project with at least one or two versions back - since I can’t ask user to upgrade every single release, neither I can maintain several versions since operators are added/implemented frequently according to user needs.

Thanks!

Anyway I’m really pleased with the progress of OOT backend support I understand how big deal can it be to able to run pytorch on non-nVidia GPUs using OpenCL.

1 Like

And how do you deal with it in out-of-tree backends?

Torchvision is out of tree!
We do keep them pinned as:
PT nightly ↔ Vision nightly (not sure how the pinning is done here tbh, but we can figure it out from CI configs/asking around)
PT 1.13 ↔ Vision 0.14
PT 1.12 ↔ Vision 0.13
etc

This has been working ok for us.

I’ll probably need to add some ifdefs to make sure I can compile the project with at least one or two versions back

We don’t do that. We do keep the release branches on github though like “release/v1.13” (similar in torchvision) that you can always use if you need an older version.
We also explicitly don’t backport fix/features to older releases.

since I can’t ask user to upgrade every single release, neither I can maintain several versions since operators are added/implemented frequently according to user needs.

Your binary will need to be binary compatible with PT. So they will have to use the same PT version (not strict to an exact hash but will need to be relatively close) as the one you used to compile the binary.
I can see a couple ideas here:

  • You do all your release against the latest release PT version: So that you have something (with fake version numbers)
    PT 1.11 ↔ OCL 0.12
    PT 1.11 ↔ OCL 0.13
    PT 1.11 ↔ OCL 0.14
    PT 1.12 ↔ OCL 0.15
    PT 1.12 ↔ OCL 0.16
    PT 1.13 ↔ OCL 0.17
    etc
    The only constraint is that, when a new version of PT comes out, you cut a new version of the lib to make sure that people can use the latest release.
  • Pin your branch and release to PT nightlies (at least early in the development while things move fast). We don’t remove nightly binaries (https://download.pytorch.org/whl/nightly/torch/) so you can pin against any nightly that you want and your user will be able to download that. It does forces your users to use a nightly version of PT.
  • If your users need other libs that rely on PT (like torchvision?), then I would suggest to use the same pinning as them to ensure users can use both with a single version of PT.

Binary compatibility is a challenge though and we’re looking into it! cc @seemethere if you have other ideas?

Binary compatibility is a challenge though and we’re looking into it

Actually I’m looking into API compatibility. I distribute the backend at this point as source from github since it is dynamic. It is also very easy to build on Linux.

Of course I want to have in future ability to distribute it as whl via pip. However this is yet another thing I need to learn to do (especially how to handle Windows case)

The only constraint is that, when a new version of PT comes out, you cut a new version of the lib to make sure that people can use the latest release.

Ok I see. I think I’ll work on latest stable and if something breaks in nightly I freeze it and start branch on adapting to nightly till it is released.

I assume my OpenCL backend releases will be very much more frequent than PT since there are lots of ops to implement and bugs to fix.

So I’ll need to see how to handle transition when nightly will break API so I can’t keep working on stable and need to nightly (for example 1.12. and 1.13 ops signatures are different and I can’t share backend code between)

Thanks for good points to think.

1 Like