xFormers: Optimized Transformer building blocks library
Memory-efficient PyTorch components for transformer architectures.
Learn more about xformers
import xformers.ops as xops\noutput = xops.memory_efficient_attention(query, key, value)

Custom CUDA kernels
xFormers includes its own optimized CUDA implementations for attention mechanisms and fused operations rather than relying solely on PyTorch primitives. This allows for memory-efficient exact attention computation and other specialized operations not available in mainstream frameworks.
Composable architecture
Components are designed as independent, domain-agnostic building blocks that can be combined without boilerplate code. This modular approach enables researchers to construct custom transformer variants while maintaining clean, readable code.
Research-oriented components
The library prioritizes access to emerging techniques and experimental operations before they reach mainstream libraries. It provides implementations of sparse attention, block-sparse attention, and other attention variants alongside standard components.
import xformers.ops as xops
attention_out = xops.memory_efficient_attention(
query, key, value
)v0.0.34: Stable wheels for PyTorch 2.10+
- –Migrated xFormers to the PyTorch stable API/ABI, which means that binary builds targeting PyTorch 2.10+ will be compatible with any later version
- –Removed optimized fast-path of SwiGLU (which was only available for A100 GPUs)
- –Removed most legacy components
v0.0.33.post2: Wheels for PyTorch 2.9.1
Fixed wheel upload to pypi
- –Fixed wheel upload to pypi
Top in AI & ML
Related Repositories
Discover similar tools and frameworks used by developers
adk-python
Modular Python framework for building production AI agents.
mediapipe
Graph-based framework for streaming media ML pipelines.
docling
Fast document parser for RAG and AI workflows.
ultralytics
PyTorch library for YOLO-based real-time computer vision.
GFPGAN
PyTorch framework for blind face restoration using StyleGAN2 priors.