nanoGPT: GPT training and finetuning codebase
Minimal PyTorch implementation for training GPT models.
Learn more about nanoGPT
nanoGPT is a Python-based training framework for GPT-scale language models built on PyTorch. It consists of approximately 300 lines each for the training loop (train.py) and model definition (model.py), with support for loading pretrained GPT-2 weights from OpenAI. The codebase handles data preprocessing, distributed training on multi-GPU setups, and checkpoint management with optional Weights & Biases logging. It is used for training models ranging from character-level networks on small datasets to reproducing GPT-2 (124M parameters) on large text corpora like OpenWebText.
Minimal codebase
The core training and model logic is contained in two approximately 300-line files, making the implementation straightforward to understand and modify without abstraction layers.
Pretrained weight loading
Can load official GPT-2 weights from OpenAI and finetune them on custom datasets, supporting model sizes up to 1.3B parameters as a starting point.
Multi-GPU training
Supports distributed training across multiple GPUs with configuration files for different hardware setups, from CPU-only machines to multi-A100 nodes.
See how people are using nanoGPT
Top in AI & ML
Related Repositories
Discover similar tools and frameworks used by developers
Open Notebook
Open source implementation of Google's NotebookLM that runs locally with document processing and podcast generation.
X Recommendation Algorithm
Open source implementation of X's recommendation algorithm for timeline and notification ranking.
ADK
Modular Python framework for building production AI agents.
MMDetection
Modular PyTorch framework for object detection research and deployment.
PaddleOCR
Multilingual OCR toolkit with document structure extraction.