Megatron-LM: GPU-optimized transformer training at scale
Library for training large transformer models with distributed computing and GPU-optimized building blocks.
Learn more about Megatron-LM
Megatron-LM is a framework for training large-scale transformer models on GPU clusters, developed by NVIDIA. The system implements multiple parallelism strategies including tensor parallelism, pipeline parallelism, data parallelism, and expert parallelism to distribute model training across multiple devices. It provides GPU-optimized kernels, mixed precision support (FP16, BF16, FP8), and memory management optimizations for efficient large model training. The framework supports various model architectures including GPT, LLaMA, Mixtral, and Mamba models.
Multi-dimensional Parallelism
Implements tensor, pipeline, data, context, and expert parallelism strategies that can be combined to efficiently distribute training across large GPU clusters.
Modular Architecture
Megatron Core provides composable building blocks for transformer components, allowing developers to construct custom training frameworks and model architectures.
GPU Kernel Optimization
Includes specialized CUDA kernels and memory management techniques optimized for NVIDIA hardware, with support for advanced precision formats including FP8.
import torch
from megatron import get_args, get_timers, initialize_megatron
from megatron.core import mpu
from megatron.model import GPTModel
from megatron.training import pretrain
from megatron.utils import get_ltor_masks_and_position_ids
# Initialize Megatron with arguments
args = initialize_megatron()
# Create model with tensor and pipeline parallelism
model = GPTModel(
num_tokentypes=0,
parallel_output=True,
pre_process=mpu.is_pipeline_first_stage(),
post_process=mpu.is_pipeline_last_stage()
)
# Setup optimizer and learning rate scheduler
optimizer = torch.optim.Adam(model.parameters(), lr=args.lr)
# Start pretraining
pretrain(train_valid_test_datasets_provider,
model_provider,
forward_step,
args_defaults={'tokenizer_type': 'GPT2BPETokenizer'})Top in AI & ML
Related Repositories
Discover similar tools and frameworks used by developers
DINOv2
PyTorch vision transformers pretrained on 142M unlabeled images.
NAFNet
Efficient PyTorch architecture for image restoration tasks.
CodeFormer
Transformer-based face restoration using vector-quantized codebook lookup.
Unsloth
Memory-efficient Python library for accelerated LLM training.
Codex CLI
OpenAI's command-line coding assistant that runs locally with ChatGPT integration for terminal use.