RLT by SakanaAI

Train LLMs to reason with Reinforcement Learning Teachers

Created 6 months ago

356 stars

Top 78.6% on SourcePulse

Project Summary

This repository provides code and instructions for training "Reinforcement Learning Teachers" (RLTs) to improve the reasoning capabilities of Large Language Models (LLMs) during test-time scaling. It's designed for researchers and practitioners in LLM alignment and efficiency who want to replicate or extend the RLT methodology.

How It Works

The project trains specialized "teacher" LLMs using reinforcement learning. These teachers generate reasoning traces for given problems, which are then used to train "student" LLMs. This approach aims to imbue students with better reasoning skills without requiring extensive fine-tuning on the student model itself, leveraging the teacher's learned reasoning process.

Quick Start & Requirements

Installation: Recommended via Conda using sh scripts/install_08.sh or manual installation with pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu124, vllm==0.8.3, tensorboard, flash-attn, flashinfer-python, and requirements_08.txt.
Prerequisites: Python 3.11, CUDA 12.4, PyTorch 2.6.0, vLLM 0.8.3, FlashAttention, FlashInfer. Requires Hugging Face login (huggingface-cli login).
Running Experiments: Uses Hydra for configuration. Launch scripts ./launch.sh (for non-vLLM) and ./launch_with_server.sh (for vLLM) are provided. Example: ./launch_with_server.sh 1 3 cfgs/run_cfg/my_run_file.yaml dataset_id_or_path=my/data/path learning_rate=0.0001.
Resources: Tested with 8x H100 GPUs, but claims reproducibility on 4 GPUs. Weight/optimizer offloading to CPU is supported.
Documentation: Paper, Checkpoints, Blog

Highlighted Details

Provides a recipe for training RLTs, extensible for custom datasets and base models.
Supports vLLM for efficient generation during RL training.
Includes pre-trained RLT models and detailed configuration files.
Offers options for distributed training with DeepSpeed and CPU offloading.

Maintenance & Community

The project is associated with SakanaAI.
The primary paper is cited as Cetin et al., 2025.

Licensing & Compatibility

The repository itself appears to be under a permissive license, but specific model licenses on Hugging Face should be checked. The README does not explicitly state the license for the code.

Limitations & Caveats

Training is resource-intensive, with examples using 8x H100 GPUs, though smaller budgets are claimed to be reproducible.
Custom datasets require specific column names (question, solution, optional reasoning_trace).
Larger student models (32B+) may require recollecting multiple reasoning traces to avoid context length issues during distillation.

Health Check

Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days