MLX: Array framework for Apple silicon machine learning
Lazy-evaluated NumPy-like arrays optimized for Apple silicon.
Learn more about MLX
MLX is an array computation framework designed specifically for machine learning workloads on Apple silicon processors. It implements lazy evaluation with dynamic graph construction, allowing computations to be deferred until results are needed and computation graphs to adapt to changing input shapes without recompilation. The framework features a unified memory model where arrays reside in shared memory accessible by both CPU and GPU devices without explicit data transfers. Common use cases include training transformer models, fine-tuning large language models with techniques like LoRA, and running inference for generative tasks on macOS and iOS devices.
Unified memory architecture
Arrays live in shared memory accessible by both CPU and GPU without requiring explicit data transfers between devices. Operations can transparently execute on either device type.
Dynamic graph construction
Computation graphs are built at runtime rather than pre-compiled, allowing function arguments to change shapes without triggering recompilation or requiring graph tracing.
Multi-language API consistency
Python, C++, C, and Swift APIs closely mirror each other, with higher-level packages like mlx.nn and mlx.optimizers following PyTorch conventions for model building.
import mlx.core as mx
a = mx.array([1, 2, 3, 4])
b = mx.array([5, 6, 7, 8])
c = a + b
print(c)This release significantly improves bandwidth performance with JACCL on macOS 26.3+ and includes various fixes.
- –Much faster bandwidth with JACCL on macOS >= 26.3
- –Disable managed memory on WSL when concurrentManagedAccess is not supported
- –Fix non simd f16 build
- –Fix 2pass sdpa on < M2
This release includes CUDA improvements for hardware compatibility and performance tuning.
- –CUDA: Fallback Event impl when there is no hardware cpu/gpu coherency
- –Tune CUDA gaph sizes on B200 and H100
- –Docs: Simple example of using MLX distributed
- –Use lower-right causal mask alignment consistently
This release brings major performance improvements for both Metal and CUDA backends with better GPU support.
- –Metal: Much faster vector fused grouped-query attention for long context
- –CUDA: Several improvements to speed up LLM inference for CUDA backend
- –CUDA: Support for dense MoEs
- –CUDA: Better support for consumer GPUs (4090, 5090, RTX 6000,...)
See how people are using MLX
Related Repositories
Discover similar tools and frameworks used by developers
Docling
Fast document parser for RAG and AI workflows.
crewAI
Python framework for autonomous multi-agent AI collaboration.
X Recommendation Algorithm
Open source implementation of X's recommendation algorithm for timeline and notification ranking.
PyTorch
Python framework for differentiable tensor computation and deep learning.
Ollama
Go-based CLI for local LLM inference and management.