Request for Review: ZenDNN Integration for AMD CPU Performance Uplift

Hi everyone,

I’d like to draw your attention to a pair of pull requests that integrate AMD’s ZenDNN library into PyTorch. The goal is to accelerate key operations on modern AMD CPUs, and we’re seeking reviews from the community to help get these merged.

This effort is composed of two main PRs:

  1. Infrastructure: ZenDNN Build Support This is the foundational PR that adds the necessary build support for the ZenDNN library.
    [ZENDNN] Add ZenDNN as an optional third-party lib by naveenthangudu · Pull Request #161155 · pytorch/pytorch · GitHub
  2. Op Integration: bmm and sdpa with ZenDNN This PR integrates highly optimized ZenDNN matmul kernels for bmm and sdpa operations in eager mode. These optimizations also benefit torch.compile and torch.export execution paths.
    [ZENDNN] Add bmm support for eager mode by naveenthangudu · Pull Request #164126 · pytorch/pytorch · GitHub

Performance & Activation

These optimizations are activated on AMD CPUs that support the AVX512 instruction set. The current implementation is focused on and supports the bfloat16 data type. The optimizations in the ZenDNN library include (but are not limited to) the following optimizations:

  • AMD CPU Cache friendly data tiling
  • Dynamic partitioning based on thread/core count
  • Dynamic dispatch to optimized microkernels
  • ML-driven heuristics for kernel selection

Based on the “TorchInductor Performance Dashboard”, we are observing the following geometric mean improvements for popular model suites with bfloat16 as dtype:

On the HuggingFace suite we see upto ~1.25x gain on some of the models. There are no regressions observed and over half of the models see > ~1.05x gain in performance.

We are excited to bring these performance enhancements to PyTorch users on AMD hardware and would greatly appreciate your time and expertise in reviewing these contributions.
Going forward, we plan to share our roadmap and the associated PRs for adding further AMD CPU optimizations to PyTorch.

Thank you!

2 Likes