l1 by cmu-l3

Control reasoning depth in language models

Created 10 months ago

257 stars

Top 98.3% on SourcePulse

Project Summary

This project introduces L1, a method for controlling the reasoning duration of language models using reinforcement learning. It addresses the challenge of optimizing computational effort for complex tasks by dynamically adjusting how long a model "thinks." The target audience includes AI researchers and engineers working with LLMs for tasks requiring multi-step reasoning, offering benefits in improved performance and resource efficiency.

How It Works

L1 employs reinforcement learning to train a policy that governs the number of reasoning steps a model takes. This allows the model to learn when to terminate its reasoning process, rather than relying on a fixed depth. The approach integrates with existing LLM architectures, using RL signals derived from task performance to optimize reasoning duration. This dynamic control is advantageous as it avoids unnecessary computation for simpler problems and enables deeper exploration for complex ones, potentially leading to better accuracy and efficiency.

Quick Start & Requirements

Primary install / run command: Installation involves cloning the repository, creating a Conda environment with Python 3.12, activating it, and installing dependencies including flash-attn (with --no-build-isolation) and verl from its Git repository, followed by requirements.txt.
Non-default prerequisites and dependencies: Conda, Python 3.12, flash-attn, and verl. flash-attn often implies GPU acceleration.
Estimated setup time or resource footprint: Not specified, but typical for LLM projects involving compilation and large model evaluation.
Links: GitHub repository: https://github.com/cmu-l3/l1.git. arXiv paper: https://arxiv.org/abs/2503.04697.

Highlighted Details

Enables dynamic control over reasoning depth in LLMs via reinforcement learning.
Provides replication scripts for specific models (L1-Qwen-1.5B-Exact, L1-Qwen-1.5B-Max) and evaluation on benchmarks like AIME2025, GPQA, LSAT, and MMLU.
Built upon foundational work from DeepSeek, Qwen, and Agentica, leveraging their models and codebases.

Maintenance & Community

The project acknowledges contributions and codebases from DeepSeek, Qwen, and Agentica.
No specific community links (Discord, Slack) or roadmap details are provided in the README.

Licensing & Compatibility

The license type is not explicitly stated in the provided README text.

Limitations & Caveats

The README does not detail specific limitations, known bugs, or alpha/beta status. The reliance on specific versions of dependencies like verl might require careful management during setup and use.

l1 by cmu-l3

Explore Similar Projects

Awesome-Long2short-on-LRMs by Hongcheng-Gao

ToRL by GAIR-NLP

Tool-Star by RUC-NLPIR

XBai-o4 by MetaStone-AI

Awesome-Efficient-Reasoning by hemingkx

POLARIS by ChenxinAn-fdu

M_GRPO by baibizhe

ReWOO by billxbf

Marco-o1 by AIDC-AI

PRIME by PRIME-RL

train-deepseek-r1 by FareedKhan-dev

GLM-4 by zai-org