nanoGPT: Understand how GPT is trained with minimal code

Star Number: 48.4K+ The most concise and efficient GPT training and fine-tuning repository with 300 lines of code to implement medium-sized GPT nanoGPT is an open-source project launched by Andrej Karpathy that aims to provide the easiest and fastest codebase for training or fine-tuning medium-sized GPT models from scratch. Based on PyTorch, it rewrites minGPT to prioritize performance, making it suitable for beginners to quickly get started with the Transformer architecture, while supporting professional-grade experiments such as reproducing GPT-2’s results on OpenWebText.

Today, when large language models have become the mainstream of the times, you may be curious:
“How exactly are GPT models trained from scratch?”
Karpathy’s open-source project , nanoGPT, aims to answer this question.

It is not another “toy model”, but a set of minimalist but complete GPT training templates, providing the whole process from data, model structure to training, and is one of the best introductory projects for learning modern large model engineering.

What is nanoGPT?

nanoGPT = The most streamlined GPT training and fine-tuning framework
It allows you to train a miniature GPT on your own data (similar in structure to GPT-2, but with minimalist and highly readable code).

Karpathy is straightforward in the README:
It aims to be the “easiest and fastest” GPT training repository.

It is suitable for:

People who want to understand the Transformer training process
People who want to train language models locally on consumer-grade GPUs
People who want to build their own LLM models or scientific research prototypes
People who want to learn the Karpathy engineering style (clean structure, no complex dependencies)

Project core functions

nanoGPT is not a “model zoo”, it is more of a teaching-grade engineering template.
It contains:

1. Data processing (prepare.py).

Read the original text (e.g. the complete works of Shakespeare)
Convert characters or words into tokens
Divide train / val
Save it in binary format for efficient model loading

Features: Simple, transparent, does not hide any details.

2. Model (model.py).

nanoGPT reproduces a pure “Decoder-only Transformer”, including:

Token embedding
Position embedding
Multi-head Self-attention
MLP feedforward network
LayerNorm
Residual connection
Masked Attention

It has a very small amount of code, but the structure is complete, and you can see for yourself how the key components of GPT come together.

3. Training script (train.py).

Support:

GPU / multi-card training
FlashAttention (optional)
PyTorch 2.x compile acceleration
Reproducible experimental configuration
Logging (loss, iteration speed)

All you need to do is:

python train.py config/train_shakespeare_char.py

It can train a “little GPT” on Shakespearean text and generate “Shakespeare-style dialogue”.

4. Reasoning (sample.py).

Once you’re done, you can use the following methods:

python sample.py --out_dir=out-shakespeare-char

Generate text.
The output style will clearly have the characteristics of the training set (e.g., the Shakespearean “thou” and “thee” tone).

How popular is nanoGPT?

Because it has three key advantages:

1. The code is simple and transparent

Karpathy often says:

“Educational but still useful.”

It’s not a toy, but a modern GPT project that can actually run through, but with a minimalist structure.

2. Truly trainable and engineering value

Instead of talking on paper, you can:

Train Shakespeare GPT
Train your writing style with GPT
Fine-tune into conversational models
Even for product prototypes

3. It is a springboard for understanding larger LLMs

After reading nanoGPT, you will understand more easily:

GPT-2
GPT-J
LLaMA series
Mistral architecture
How FlashAttention works

It is especially suitable for you to read further about Transformers from Scratch, nanoLLaMA , etc. in the future.

What can you do with nanoGPT?

There are many practical applications:

Train a “little ChatGPT” on your own text

For example: portfolio copywriting, your favorite author’s tone, study notes, official account style.

Try to make a “small chatbot prototype”

Fine-tune on small datasets and let it be “your style assistant.”

Understand the whole process of large model training

Includes:
Data → Token → Batch → Attention Calculation → Loss → Optimization → Inference

Reuse engineering structures in your own papers or projects

Because it is clean enough to be used as a scientific research baseline.

Brief description of the code structure

Warehouse main files:

nanoGPT/
├── train.py # 训练主脚本
├── sample.py # 推理/生成
├── model.py # GPT 模型定义
├── prepare.py # 数据预处理
├── data/ # 示例数据
└── config/ # 各种训练配置

It’s so lightweight that you can read the overall architecture in minutes.

Summary: A project that is really worth “reading through the source code”

nanoGPT is the kind of thing that you can understand by reading the source code once:

The internal structure of GPT
How attention is calculated
How to enter the model of text data
How autoregressive language models are trained
How to run through modern LLMs with minimal code

For someone like you who loves technology and needs to do content creation, it’s perfect for:

Blog content
Video commentary
Learn the tutorial
AI basic project templates

GitHub：https://github.com/karpathy/nanoGPT
Tubing: