In collaboration with Tri Dao and xformers, PyTorch has uploaded official Flash Attention 3 wheels to https://download.pytorch.org/whl/flash-attn-3/. We have pre-built wheels for various CUDA versions (12.6+, 13), CPUs (x86, ARM) and OS (Linux, Windows).
Before today, there was no standard distribution of pre-built wheels for Flash Attention 3 (FA3). Due to the number of kernels FA3 has, installing from source takes 1+ hours without parallel build and can result in segfaults with some nvcc versions, errors on windows depending on msvc/cuda version, etc. Furthermore, xformers is the only library that supports FA3 on Windows, so oftentimes it’s required as a dependency solely for that reason.
By building these wheels for the PyTorch community, we’ve eliminated this complicated build process and have decoupled the dependency from xformers.
These wheels are also LibTorch ABI stable and therefore compatible with any Python >= 3.10 and torch >= 2.9.
To install, please use pip install flash-attn-3 –-index-url=https://download.pytorch.org/whl/{CUDA}