Tinker Cookbook: 4 Most Core Training Examples

You can easily fine-tune large language models (LLMs) with Tinker and Tinker Cookbook to meet your specific needs without the need to manage complex training infrastructure. Tinker takes care of distributed training and uses efficient LoRA adapters to reduce costs and speed up customization. Cookbook provides ready-to-use examples and tools for tasks like chat, mathematical reasoning, and reinforcement learning, helping you quickly build and improve AI models.

Based on the thinking-machines-lab/tinker-cookbook repository, this article selects four examples that are the most basic, important, and most helpful for understanding the Tinker training philosophy:
SL Loop、RL Loop、SL Basic、RL Basic

Tinker is a lightweight yet powerful framework for model training and fine-tuning, emphasizing “transparent training loops” and “configurable training flows”.
If you want to truly understand how an LLM is trained, these four examples are enough as an entry point.

1. sl_loop.py: Write a supervised learning (SL) training loop from scratch

Path:
tinker_cookbook/recipes/sl_loop.py

Why is this example important?

It shows training loops in its purest form, without any magic, without hidden logic, just a few things like this:

  • Forward Calculation
  • Calculate loss
  • Backward
  • Optimizer Update (step)
  • Save Status (save_state)

In other words, this is the Tinker version of the minimal training code you write by hand in PyTorch.

Who is this example for?

  • I want to see how Tinker encapsulates the training loop
  • Want to understand the most realistic workings behind the training process
  • Developers who want to extend loss, metrics, logs themselves

This example is very “original”, but therefore the most educational.

2. rl_loop.py: Write a reinforcement learning (RL) training loop from scratch

Path:
tinker_cookbook/recipes/rl_loop.py

Why is this example important?

Reinforcement learning (RL) is often a key part of fine-tuning language models, such as:

  • Improve model alignment
  • Optimize the quality of your model’s responses
  • Make the model better at outputting according to instructions

In this example, you can see a simple but complete RL loop that includes:

  • Sample
  • Calculating rewards
  • Policy update

It is a must-understand part of the SFT → RLHF or “Supervised Fine-Tuning→ Enhanced Fine-Tuning” structure.

This example can help you:

  • Understand the low-level logic of RLHF
  • Understand the basic concepts of strategy iteration and reward modeling
  • Build a foundation for more complex RL training such as PPO

3. sl_basic.py: Complete a supervised fine-tuning process with configuration

Path:
tinker_cookbook/recipes/sl_basic.py

Key points: from “handwriting” to “declarative training”

 sl_loop.py If it is “write your own training cycle”,
 sl_basic.py Then it is:

You just need to write a config and Tinker will complete the training automatically.

This is also the core design philosophy of Tinker:

  • The training flow is readable
  • Training steps are separated
  • Control the entire lifecycle with a unified Runner

In this example, you can see:

  • How to define a dataset
  • How to load the tokenizer and model
  • How to set training parameters (batch, lr, epochs)
  • How to run and save the results

It’s more like the training method you would actually use in the project.

4. rl_basic.py: Declarative reinforcement learning training process

Path:
tinker_cookbook/recipes/rl_basic.py

This example is similar to sl_basic.py but shows:

How to complete a reinforcement learning training (such as reward strategy optimization) with “Configuration”.

 rl_loop.py Compared to the handwriting loop, this version shows:

  • How to define RL’s policy
  • How to set reward functions/sampling strategies
  • How to configure the number of steps for an iteration
  • How to run the entire RL process smoothly with Runner

You can think of it as a practical version of RL training, closer to the actual project.

Relationship between four examples (summary in one sentence)

  • sl_loop.py: Minimal SFT training loop (fully handwritten)
  • rl_loop.py: Minimal RL loop (fully handwritten)
  • sl_basic.py: Declarative / Automated SFT (for actual projects)
  • rl_basic.py: Declarative / Automated RL (use this for the actual RL)

If you want to learn the route, I recommend following this order:

1 → 2 (Understand the underlying logic first)

3 → 4 (Relearning Automated Training Methods)

This way, your understanding of model training is “bottom-up”.

Summary

These four examples from the Tinker Cookbook form the core of a minimalist but complete model training framework:

ExampleTypeDifficultyUses:
sl_loop.pySL handwriting loop⭐⭐Understand the low-level training process
rl_loop.pyRL handwriting loop⭐⭐⭐Understand RLHF basics
sl_basic.pySL Declarative TrainingActual project recommendation
rl_basic.pyRL declarative training⭐⭐Practical RL fine-tuning

If you want to train and fine-tune a model yourself in the future, these four examples are almost “compulsory”.

GitHub:https://github.com/thinking-machines-lab/tinker-cookbook
Tubing:

Scroll to Top