PyTorch 2.1: automatic dynamic shape compilation, torch.distributed.checkpoint, torch.compile + NumPy, torch.export prototype, and more!

We are pleased to announce the release of PyTorch 2.1

This is a hugely important release for us, as it continues the momentum we established with the 2.0 release back in March, and gives us plenty of exciting features and hardening work to discuss at the PyTorch Conference – coming up in less than two weeks!

Thank you all for the dedication in getting this release out.


We are excited to announce the release of PyTorch® 2.1 (release note). PyTorch 2.1 offers automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API.

In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, as well as torch.export-based quantization.

Along with 2.1, we are also releasing a series of updates to the PyTorch domain libraries. More details can be found in the library updates blog.

This release is composed of 6,682 commits and 784 contributors since 2.0. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.1. More information about how to get started with the PyTorch 2-series can be found at our Getting Started page.

Feature Summary
  • torch.compile now includes automatic support for detecting and minimizing recompilations due to tensor shape changes using automatic dynamic shapes.

  • torch.distributed.checkpoint enables saving and loading models from multiple ranks in parallel, as well as resharding due to changes in cluster topology.

  • torch.compile can now compile NumPy operations via translating them into PyTorch-equivalent operations.

  • torch.compile now includes improved support for Python 3.11.

  • New CPU performance features include inductor improvements (e.g. bfloat16 support and dynamic shapes), AVX512 kernel support, and scaled-dot-product-attention kernels.

  • torch.export, a sound full-graph capture mechanism is introduced as a prototype feature, as well as torch.export-based quantization.

  • torch.sparse now includes prototype support for semi-structured (2:4) sparsity on NVIDIA® GPUs.


Special thanks to the release v-team: Gregory Chanan , Huy Do , Carl Parker, Svetlana Karslioglu, Jerry Zhang, Alban Desmaison, Eli Uriegas, Omkar Salpekar, Ankith Gunapal, Nikita Shulga