nanoGPT: GPT training and finetuning codebase
Minimal PyTorch implementation for training GPT models.
Learn more about nanoGPT
nanoGPT is a Python-based training framework for GPT-scale language models built on PyTorch. It consists of approximately 300 lines each for the training loop (train.py) and model definition (model.py), with support for loading pretrained GPT-2 weights from OpenAI. The codebase handles data preprocessing, distributed training on multi-GPU setups, and checkpoint management with optional Weights & Biases logging. It is used for training models ranging from character-level networks on small datasets to reproducing GPT-2 (124M parameters) on large text corpora like OpenWebText.
Minimal codebase
The core training and model logic is contained in two approximately 300-line files, making the implementation straightforward to understand and modify without abstraction layers.
Pretrained weight loading
Can load official GPT-2 weights from OpenAI and finetune them on custom datasets, supporting model sizes up to 1.3B parameters as a starting point.
Multi-GPU training
Supports distributed training across multiple GPUs with configuration files for different hardware setups, from CPU-only machines to multi-A100 nodes.
See how people are using nanoGPT
Related Repositories
Discover similar tools and frameworks used by developers
Ray
Unified framework for scaling AI and Python applications from laptops to clusters with distributed runtime.
AI-Trader
LLM agent benchmarking framework for autonomous market trading.
Model Context Protocol Servers
Reference implementations for LLM tool and data integration.
ADK
Modular Python framework for building production AI agents.
OpenHands
LLM agent framework automating development in sandboxed containers.