nanoGPT: GPT training and finetuning codebase
Minimal PyTorch implementation for training GPT models.
Learn more about nanoGPT
nanoGPT is a Python-based training framework for GPT-scale language models built on PyTorch. It consists of approximately 300 lines each for the training loop (train.py) and model definition (model.py), with support for loading pretrained GPT-2 weights from OpenAI. The codebase handles data preprocessing, distributed training on multi-GPU setups, and checkpoint management with optional Weights & Biases logging. It is used for training models ranging from character-level networks on small datasets to reproducing GPT-2 (124M parameters) on large text corpora like OpenWebText.
Minimal codebase
The core training and model logic is contained in two approximately 300-line files, making the implementation straightforward to understand and modify without abstraction layers.
Pretrained weight loading
Can load official GPT-2 weights from OpenAI and finetune them on custom datasets, supporting model sizes up to 1.3B parameters as a starting point.
Multi-GPU training
Supports distributed training across multiple GPUs with configuration files for different hardware setups, from CPU-only machines to multi-A100 nodes.
git clone https://github.com/karpathy/nanoGPT.git
cd nanoGPT
pip install torch numpy transformers datasets tiktoken wandb tqdmRelated Repositories
Discover similar tools and frameworks used by developers
streamlit
Python framework for reactive data web applications.
EasyOCR
PyTorch OCR library using CRAFT and CRNN models.
sglang
High-performance inference engine for LLMs and VLMs.
GFPGAN
PyTorch framework for blind face restoration using StyleGAN2 priors.
segment-anything
Transformer-based promptable segmentation with zero-shot generalization.