Discover and explore top open-source AI tools and projects—updated daily.
Control reasoning depth in language models
Top 99.4% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This project introduces L1, a method for controlling the reasoning duration of language models using reinforcement learning. It addresses the challenge of optimizing computational effort for complex tasks by dynamically adjusting how long a model "thinks." The target audience includes AI researchers and engineers working with LLMs for tasks requiring multi-step reasoning, offering benefits in improved performance and resource efficiency.
How It Works
L1 employs reinforcement learning to train a policy that governs the number of reasoning steps a model takes. This allows the model to learn when to terminate its reasoning process, rather than relying on a fixed depth. The approach integrates with existing LLM architectures, using RL signals derived from task performance to optimize reasoning duration. This dynamic control is advantageous as it avoids unnecessary computation for simpler problems and enables deeper exploration for complex ones, potentially leading to better accuracy and efficiency.
Quick Start & Requirements
flash-attn
(with --no-build-isolation
) and verl
from its Git repository, followed by requirements.txt
.flash-attn
, and verl
. flash-attn
often implies GPU acceleration.https://github.com/cmu-l3/l1.git
. arXiv paper: https://arxiv.org/abs/2503.04697
.Highlighted Details
L1-Qwen-1.5B-Exact
, L1-Qwen-1.5B-Max
) and evaluation on benchmarks like AIME2025, GPQA, LSAT, and MMLU.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README does not detail specific limitations, known bugs, or alpha/beta status. The reliance on specific versions of dependencies like verl
might require careful management during setup and use.
4 months ago
1 week