Navigate:
nanoGPT
~$NANOG0.9%

nanoGPT: GPT training and finetuning codebase

Minimal PyTorch implementation for training GPT models.

LIVE RANKINGS • 10:20 AM • STEADY
TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50
OVERALL
#38
18
AI & ML
#23
8
30 DAY RANKING TREND
ovr#38
·AI#23
STARS
53.8K
FORKS
9.1K
7D STARS
+506
7D FORKS
+81
Tags:
See Repo:
Share:

Learn more about nanoGPT

nanoGPT is a Python-based training framework for GPT-scale language models built on PyTorch. It consists of approximately 300 lines each for the training loop (train.py) and model definition (model.py), with support for loading pretrained GPT-2 weights from OpenAI. The codebase handles data preprocessing, distributed training on multi-GPU setups, and checkpoint management with optional Weights & Biases logging. It is used for training models ranging from character-level networks on small datasets to reproducing GPT-2 (124M parameters) on large text corpora like OpenWebText.

nanoGPT

1

Minimal codebase

The core training and model logic is contained in two approximately 300-line files, making the implementation straightforward to understand and modify without abstraction layers.

2

Pretrained weight loading

Can load official GPT-2 weights from OpenAI and finetune them on custom datasets, supporting model sizes up to 1.3B parameters as a starting point.

3

Multi-GPU training

Supports distributed training across multiple GPUs with configuration files for different hardware setups, from CPU-only machines to multi-A100 nodes.


See how people are using nanoGPT

Loading tweets...


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers