ABI Stability in the ML Ecosystem: LibTorch ABI Stable & TVM-FFI

An ABI (Application Binary Interface) is the contract for how compiled code will interact at the binary level, enabling interoperability across languages and toolchains. ABI stability is the guarantee that the contract will not change across versions, meaning that a compiled binary can run against a different version of a library than it was built with. For PyTorch, the number one users ABI stability would benefit are kernel writers in the PyTorch ecosystem. There are many other people who would also benefit, such as those writing rust bindings, folks who are extending PyTorch in more nontrivial ways, projects binding torch code to a different Tensor library (like ExecuTorch), or compilers that want a stable codegen target.

With regards to authoring kernels, note that ABI stability only applies if you are writing precompiled kernels (in native code or through output from AOT (ahead-of-time) compilers like AOT Triton, CuTe DSL AOT, AOTInductor). If you’re just using kernels in a JIT (just-in-time) way (like through CuTe DSL or Triton), ABI stability is not a problem for you because you are always rebuilding native code specifically against whatever tensor library is loaded.

There are two main projects working towards ABI stability in the ML ecosystem: LibTorch ABI Stable and TVM-FFI. In particular, TVM-FFI is designed as a minimal, general-purpose stable ABI layer (not tied to any framework), while LibTorch ABI Stable focuses on making existing PyTorch APIs ABI stable. When should you use one over the other? The answer is simple – they don’t provide the same features, so you should pick the one with the features you need!

What are the key features of each framework and how do they partially overlap?

Thanks @albanD and Tianqi for reviewing!

2 Likes