xFormers: Optimized Transformer building blocks library
Memory-efficient PyTorch components for transformer architectures.
Learn more about xformers
import xformers.ops as xops\noutput = xops.memory_efficient_attention(query, key, value)

Custom CUDA kernels
xFormers includes its own optimized CUDA implementations for attention mechanisms and fused operations rather than relying solely on PyTorch primitives. This allows for memory-efficient exact attention computation and other specialized operations not available in mainstream frameworks.
Composable architecture
Components are designed as independent, domain-agnostic building blocks that can be combined without boilerplate code. This modular approach enables researchers to construct custom transformer variants while maintaining clean, readable code.
Research-oriented components
The library prioritizes access to emerging techniques and experimental operations before they reach mainstream libraries. It provides implementations of sparse attention, block-sparse attention, and other attention variants alongside standard components.
import xformers.ops as xops
attention_out = xops.memory_efficient_attention(
query, key, value
)Adds Blackwell GPU support and flash-attention 2.8.3 compatibility; enables DeepSeek-style compute/comms overlap.
- –Use cutlass fmha Op for Blackwell GPUs and set FA3 deterministic mode via new exposed flag.
- –Leverage FW+BW pass overlap for DeepSeek-like architectures to reduce communication latency.
Adds prebuilt binaries for ROCM 6.4; no breaking changes or new requirements specified in release notes.
- –Install xformers on AMD GPUs running ROCM 6.4 using the new prebuilt wheel.
- –Release notes do not specify breaking changes, deprecations, or migration steps.
Release notes do not specify breaking changes, requirements, or migration steps for this post-release patch.
- –Review commit history or changelog manually to identify fixes or changes included in this patch.
- –Test thoroughly in staging before upgrading production deployments due to lack of documented changes.
Related Repositories
Discover similar tools and frameworks used by developers
yolov7
PyTorch single-stage detector with bag-of-freebies training optimizations.
lerobot
PyTorch library for robot imitation learning and sim-to-real transfer.
gemini-cli
Access Google's powerful Gemini AI models directly from your terminal with an intuitive command-line interface for text, image, and multimodal interactions.
faiss
Efficient approximate nearest neighbor search for billion-scale vectors.
CodeFormer
Transformer-based face restoration using vector-quantized codebook lookup.