Navigate:
~$MLX0.6%

MLX: Array framework for Apple silicon machine learning

Lazy-evaluated NumPy-like arrays optimized for Apple silicon.

LIVE RANKINGS • 12:29 PM • STEADY
OVERALL
#125
53
AI & ML
#52
15
30 DAY RANKING TREND
ovr#125
·AI#52
STARS
24.1K
FORKS
1.5K
7D STARS
+131
7D FORKS
+16
Tags:
See Repo:
Share:

Learn more about MLX

MLX is an array computation framework designed specifically for machine learning workloads on Apple silicon processors. It implements lazy evaluation with dynamic graph construction, allowing computations to be deferred until results are needed and computation graphs to adapt to changing input shapes without recompilation. The framework features a unified memory model where arrays reside in shared memory accessible by both CPU and GPU devices without explicit data transfers. Common use cases include training transformer models, fine-tuning large language models with techniques like LoRA, and running inference for generative tasks on macOS and iOS devices.

MLX

1

Unified memory architecture

Arrays live in shared memory accessible by both CPU and GPU without requiring explicit data transfers between devices. Operations can transparently execute on either device type.

2

Dynamic graph construction

Computation graphs are built at runtime rather than pre-compiled, allowing function arguments to change shapes without triggering recompilation or requiring graph tracing.

3

Multi-language API consistency

Python, C++, C, and Swift APIs closely mirror each other, with higher-level packages like mlx.nn and mlx.optimizers following PyTorch conventions for model building.


import mlx.core as mx

a = mx.array([1, 2, 3, 4])
b = mx.array([5, 6, 7, 8])
c = a + b
print(c)

vv0.30.6

This release significantly improves bandwidth performance with JACCL on macOS 26.3+ and includes various fixes.

  • Much faster bandwidth with JACCL on macOS >= 26.3
  • Disable managed memory on WSL when concurrentManagedAccess is not supported
  • Fix non simd f16 build
  • Fix 2pass sdpa on < M2
vv0.30.5

This release includes CUDA improvements for hardware compatibility and performance tuning.

  • CUDA: Fallback Event impl when there is no hardware cpu/gpu coherency
  • Tune CUDA gaph sizes on B200 and H100
  • Docs: Simple example of using MLX distributed
  • Use lower-right causal mask alignment consistently
vv0.30.4

This release brings major performance improvements for both Metal and CUDA backends with better GPU support.

  • Metal: Much faster vector fused grouped-query attention for long context
  • CUDA: Several improvements to speed up LLM inference for CUDA backend
  • CUDA: Support for dense MoEs
  • CUDA: Better support for consumer GPUs (4090, 5090, RTX 6000,...)

See how people are using MLX

Loading tweets...


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers