Discover and explore top open-source AI tools and projects—updated daily.
Train LLMs to reason with Reinforcement Learning Teachers
Top 81.1% on SourcePulse
This repository provides code and instructions for training "Reinforcement Learning Teachers" (RLTs) to improve the reasoning capabilities of Large Language Models (LLMs) during test-time scaling. It's designed for researchers and practitioners in LLM alignment and efficiency who want to replicate or extend the RLT methodology.
How It Works
The project trains specialized "teacher" LLMs using reinforcement learning. These teachers generate reasoning traces for given problems, which are then used to train "student" LLMs. This approach aims to imbue students with better reasoning skills without requiring extensive fine-tuning on the student model itself, leveraging the teacher's learned reasoning process.
Quick Start & Requirements
sh scripts/install_08.sh
or manual installation with pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu124
, vllm==0.8.3
, tensorboard
, flash-attn
, flashinfer-python
, and requirements_08.txt
.huggingface-cli login
)../launch.sh
(for non-vLLM) and ./launch_with_server.sh
(for vLLM) are provided. Example: ./launch_with_server.sh 1 3 cfgs/run_cfg/my_run_file.yaml dataset_id_or_path=my/data/path learning_rate=0.0001
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
question
, solution
, optional reasoning_trace
).2 months ago
Inactive