xFormers: Optimized Transformer building blocks library
Memory-efficient PyTorch components for transformer architectures.
Learn more about xFormers
import xformers.ops as xops\noutput = xops.memory_efficient_attention(query, key, value)

Custom CUDA kernels
xFormers includes its own optimized CUDA implementations for attention mechanisms and fused operations rather than relying solely on PyTorch primitives. This allows for memory-efficient exact attention computation and other specialized operations not available in mainstream frameworks.
Composable architecture
Components are designed as independent, domain-agnostic building blocks that can be combined without boilerplate code. This modular approach enables researchers to construct custom transformer variants while maintaining clean, readable code.
Research-oriented components
The library prioritizes access to emerging techniques and experimental operations before they reach mainstream libraries. It provides implementations of sparse attention, block-sparse attention, and other attention variants alongside standard components.
import xformers.ops as xops
attention_out = xops.memory_efficient_attention(
query, key, value
)Stable wheels for PyTorch 2.10+
- –Migrated xFormers to the PyTorch stable API/ABI, which means that binary builds targeting PyTorch 2.10+ will be compatible with any later version
- –Removed optimized fast-path of SwiGLU (which was only available for A100 GPUs)
- –Removed most legacy components
Wheels for PyTorch 2.9.1
Fixed wheel upload to pypi
- –Fixed wheel upload to pypi
Related Repositories
Discover similar tools and frameworks used by developers
Crush
LLM-powered coding agent with LSP and MCP integration.
Ultralytics YOLO
PyTorch library for YOLO-based real-time computer vision.
CUTLASS
CUDA C++ templates and Python DSLs for high-performance matrix multiplication on GPUs.
InvokeAI
Node-based workflow interface for local Stable Diffusion deployment.
FastMCP
Build Model Context Protocol servers with decorators.