You can easily fine-tune large language models (LLMs) with Tinker and Tinker Cookbook to meet your specific needs without the need to manage complex training infrastructure. Tinker takes care of distributed training and uses efficient LoRA adapters to reduce costs and speed up customization. Cookbook provides ready-to-use examples and tools for tasks like chat, mathematical reasoning, and reinforcement learning, helping you quickly build and improve AI models.
Based on the thinking-machines-lab/tinker-cookbook repository, this article selects four examples that are the most basic, important, and most helpful for understanding the Tinker training philosophy:
SL Loop、RL Loop、SL Basic、RL Basic。
Tinker is a lightweight yet powerful framework for model training and fine-tuning, emphasizing “transparent training loops” and “configurable training flows”.
If you want to truly understand how an LLM is trained, these four examples are enough as an entry point.
1. sl_loop.py: Write a supervised learning (SL) training loop from scratch
Path:tinker_cookbook/recipes/sl_loop.py
Why is this example important?
It shows training loops in its purest form, without any magic, without hidden logic, just a few things like this:
- Forward Calculation
- Calculate loss
- Backward
- Optimizer Update (step)
- Save Status (save_state)
In other words, this is the Tinker version of the minimal training code you write by hand in PyTorch.
Who is this example for?
- I want to see how Tinker encapsulates the training loop
- Want to understand the most realistic workings behind the training process
- Developers who want to extend loss, metrics, logs themselves
This example is very “original”, but therefore the most educational.
2. rl_loop.py: Write a reinforcement learning (RL) training loop from scratch
Path:tinker_cookbook/recipes/rl_loop.py
Why is this example important?
Reinforcement learning (RL) is often a key part of fine-tuning language models, such as:
- Improve model alignment
- Optimize the quality of your model’s responses
- Make the model better at outputting according to instructions
In this example, you can see a simple but complete RL loop that includes:
- Sample
- Calculating rewards
- Policy update
It is a must-understand part of the SFT → RLHF or “Supervised Fine-Tuning→ Enhanced Fine-Tuning” structure.
This example can help you:
- Understand the low-level logic of RLHF
- Understand the basic concepts of strategy iteration and reward modeling
- Build a foundation for more complex RL training such as PPO
3. sl_basic.py: Complete a supervised fine-tuning process with configuration
Path:tinker_cookbook/recipes/sl_basic.py
Key points: from “handwriting” to “declarative training”
sl_loop.py If it is “write your own training cycle”,
sl_basic.py Then it is:
You just need to write a config and Tinker will complete the training automatically.
This is also the core design philosophy of Tinker:
- The training flow is readable
- Training steps are separated
- Control the entire lifecycle with a unified Runner
In this example, you can see:
- How to define a dataset
- How to load the tokenizer and model
- How to set training parameters (batch, lr, epochs)
- How to run and save the results
It’s more like the training method you would actually use in the project.
4. rl_basic.py: Declarative reinforcement learning training process
Path:tinker_cookbook/recipes/rl_basic.py
This example is similar to sl_basic.py but shows:
How to complete a reinforcement learning training (such as reward strategy optimization) with “Configuration”.
rl_loop.py Compared to the handwriting loop, this version shows:
- How to define RL’s policy
- How to set reward functions/sampling strategies
- How to configure the number of steps for an iteration
- How to run the entire RL process smoothly with Runner
You can think of it as a practical version of RL training, closer to the actual project.
Relationship between four examples (summary in one sentence)
- sl_loop.py: Minimal SFT training loop (fully handwritten)
- rl_loop.py: Minimal RL loop (fully handwritten)
- sl_basic.py: Declarative / Automated SFT (for actual projects)
- rl_basic.py: Declarative / Automated RL (use this for the actual RL)
If you want to learn the route, I recommend following this order:
1 → 2 (Understand the underlying logic first)
3 → 4 (Relearning Automated Training Methods)
This way, your understanding of model training is “bottom-up”.
Summary
These four examples from the Tinker Cookbook form the core of a minimalist but complete model training framework:
| Example | Type | Difficulty | Uses: |
|---|---|---|---|
| sl_loop.py | SL handwriting loop | ⭐⭐ | Understand the low-level training process |
| rl_loop.py | RL handwriting loop | ⭐⭐⭐ | Understand RLHF basics |
| sl_basic.py | SL Declarative Training | ⭐ | Actual project recommendation |
| rl_basic.py | RL declarative training | ⭐⭐ | Practical RL fine-tuning |
If you want to train and fine-tune a model yourself in the future, these four examples are almost “compulsory”.
GitHub:https://github.com/thinking-machines-lab/tinker-cookbook
Tubing: