Navigate:

All ReposHiggsfield

~$HIGGS↑0.1%

Higgsfield: GPU orchestration for large-scale model training

Cluster manager for multi-node PyTorch model training.

LIVE RANKINGS • 12:33 PM • STEADY

OVERALL

#400

AI & ML

#105

30 DAY RANKING TREND

ovr#400

·AI#105

STARS

3.6K

FORKS

592

7D STARS

7D FORKS

Tags:

AI & ML

See Repo:

Learn more about Higgsfield

Higgsfield is a GPU orchestration framework designed to manage and coordinate distributed training of large-scale PyTorch models across multiple nodes. It provides a cluster management layer that handles resource allocation, task scheduling, and inter-node communication for multi-GPU training workloads. The framework abstracts the complexity of distributed computing by automatically configuring process groups, managing data parallelism strategies, and monitoring training jobs across the cluster. It integrates with existing PyTorch training pipelines through a Python API that wraps model and data configurations with distributed execution logic. The system is optimized for high-performance computing environments where training large neural networks requires coordinated use of GPUs spanning multiple physical machines.

GitHub-integrated workflows

Experiments are defined as Python code and automatically deployed through GitHub Actions, eliminating separate deployment pipelines. Checkpoints and experiment monitoring occur through GitHub's interface rather than custom dashboards.

ZeRO-3 and FSDP support

Native compatibility with DeepSpeed's ZeRO-3 and PyTorch's fully sharded data parallel APIs enables efficient parameter sharding for trillion-parameter models without requiring custom implementation.

Simplified experiment definition

Training experiments are written as standard Python functions with a decorator, avoiding configuration files or argument parsing. Users can incorporate custom PyTorch code, DeepSpeed, or Accelerate without framework constraints.

from higgsfield import DistributedTrainer
import torch
import torch.nn as nn

model = nn.Linear(1024, 512)
trainer = DistributedTrainer(model=model, world_size=4, backend='nccl')

for batch in dataloader:
    loss = trainer.training_step(batch)
    trainer.backward(loss)
    trainer.step()

Top in AI & ML

Trending Repos

Pi Mono

17,222#1

OpenClaw

233,443#2

Zvec

8,089#3

Claude Code

70,649#4

Heretic

9,761#5

See all →

LIVE RANKINGS • 12:33 PM • STEADY

OVERALL

#400

AI & ML

#105

30 DAY RANKING TREND

ovr#400

·AI#105

STARS

3.6K

FORKS

592

7D STARS

7D FORKS

[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers

Higgsfield: GPU orchestration for large-scale model training

Learn more about Higgsfield

What is Higgsfield for?

What makes Higgsfield different?

GitHub-integrated workflows

ZeRO-3 and FSDP support

Simplified experiment definition

Example code snippets

Top in AI & ML

Pi Mono

OpenClaw

Claude Code

Heretic

Rowboat

Trending Repos

Pi Mono

OpenClaw

Zvec

Claude Code

Heretic

Related Repositories

InvokeAI

AI-Trader

Pi Mono

DeepSeek Coder

Video2X

Product

Company

Helpful Links