Navigate:

All ReposDeepSpeed

~$DEEPS↑0.1%

DeepSpeed: Deep learning optimization library for distributed training

PyTorch library for training billion-parameter models efficiently.

LIVE RANKINGS • 12:28 PM • STEADY

OVERALL

#292

163

AI & ML

#90

30 DAY RANKING TREND

ovr#292

·AI#90

STARS

41.7K

FORKS

4.7K

7D STARS

+50

7D FORKS

Tags:

AI & ML

See Repo:

Learn more about DeepSpeed

DeepSpeed is a Python library built on PyTorch that optimizes distributed deep learning training and inference through system-level innovations. It implements multiple parallelism approaches including data parallelism, model parallelism, pipeline parallelism, and sequence parallelism, along with memory optimization techniques like ZeRO (Zero Redundancy Optimizer). The library handles communication patterns across GPU clusters and supports offloading to CPU memory and NVMe storage. Common applications include training large language models with billions to trillions of parameters across multi-GPU and multi-node setups.

ZeRO Memory Optimization

Partitions model states, gradients, and optimizer states across devices to reduce per-GPU memory footprint. Trains models 8x larger than standard data parallelism on the same hardware without code changes.

Hybrid Parallelism Strategies

Combines data, tensor, pipeline, and sequence parallelism through configuration rather than custom implementation. Users select and compose strategies based on model architecture and cluster topology.

Multi-Tier Memory Offloading

Automatically manages memory across GPU, CPU, and NVMe storage to train models larger than available VRAM. ZeRO-Infinity enables trillion-parameter model training on consumer hardware through intelligent memory orchestration.

import deepspeed
import torch

model = torch.nn.Linear(1024, 1024)
optimizer = torch.optim.Adam(model.parameters())

model_engine, optimizer, _, _ = deepspeed.initialize(
    model=model,
    optimizer=optimizer,
    config="ds_config.json"
)

# Train with model_engine.forward() and model_engine.backward()

vv0.18.4

Patch release with SuperOffloadOptimizerStage3 crash fix and AMD ROCm improvements.

–Fix SuperOffloadOptimizerStage3 crash due to missing paramnames parameter
–Improve support of AMD ROCm
–Disable deterministic option in compile tests

vv0.18.3

Patch release adding separate learning rates for muon optimizer and leaf module improvements.

–allow seperate learning rate "muonlr" and "adamlr" for muon optimizer
–leaf modules: explain better
–disable nv-lightning-v100.yml CI

vv0.18.2

Patch release with ZeRO3 fp32 weight deduplication and Ulysses API improvements.

–Deduplicate fp32 weights under torch autocast and ZeRO3
–ulysses mpu: additional api
–ALST/UlyssesSP: more intuitive API wrt variable seqlen
–Fix misplaced overflow handling return in fused_optimizer.py

See how people are using DeepSpeed

Loading tweets...

Top in AI & ML

Trending Repos

Pi Mono

17,222#1

OpenClaw

233,443#2

Zvec

8,089#3

Claude Code

70,649#4

Heretic

9,761#5

See all →

LIVE RANKINGS • 12:28 PM • STEADY

OVERALL

#292

163

AI & ML

#90

30 DAY RANKING TREND

ovr#292

·AI#90

STARS

41.7K

FORKS

4.7K

7D STARS

+50

7D FORKS

[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers

DeepSpeed: Deep learning optimization library for distributed training

Learn more about DeepSpeed

What is DeepSpeed for?

What makes DeepSpeed different?

ZeRO Memory Optimization

Hybrid Parallelism Strategies

Multi-Tier Memory Offloading

Example code snippets

From YouTube

Recent Changes

See how people are using DeepSpeed

Top in AI & ML

Pi Mono

OpenClaw

Claude Code

Heretic

Rowboat

Trending Repos

Pi Mono

OpenClaw

Zvec

Claude Code

Heretic

Related Repositories

Open WebUI

Unsloth

tiktoken

Kimi-K2

Paperless-ngx

Product

Company

Helpful Links