l1  by cmu-l3

Control reasoning depth in language models

Created 6 months ago
253 stars

Top 99.4% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This project introduces L1, a method for controlling the reasoning duration of language models using reinforcement learning. It addresses the challenge of optimizing computational effort for complex tasks by dynamically adjusting how long a model "thinks." The target audience includes AI researchers and engineers working with LLMs for tasks requiring multi-step reasoning, offering benefits in improved performance and resource efficiency.

How It Works

L1 employs reinforcement learning to train a policy that governs the number of reasoning steps a model takes. This allows the model to learn when to terminate its reasoning process, rather than relying on a fixed depth. The approach integrates with existing LLM architectures, using RL signals derived from task performance to optimize reasoning duration. This dynamic control is advantageous as it avoids unnecessary computation for simpler problems and enables deeper exploration for complex ones, potentially leading to better accuracy and efficiency.

Quick Start & Requirements

  • Primary install / run command: Installation involves cloning the repository, creating a Conda environment with Python 3.12, activating it, and installing dependencies including flash-attn (with --no-build-isolation) and verl from its Git repository, followed by requirements.txt.
  • Non-default prerequisites and dependencies: Conda, Python 3.12, flash-attn, and verl. flash-attn often implies GPU acceleration.
  • Estimated setup time or resource footprint: Not specified, but typical for LLM projects involving compilation and large model evaluation.
  • Links: GitHub repository: https://github.com/cmu-l3/l1.git. arXiv paper: https://arxiv.org/abs/2503.04697.

Highlighted Details

  • Enables dynamic control over reasoning depth in LLMs via reinforcement learning.
  • Provides replication scripts for specific models (L1-Qwen-1.5B-Exact, L1-Qwen-1.5B-Max) and evaluation on benchmarks like AIME2025, GPQA, LSAT, and MMLU.
  • Built upon foundational work from DeepSeek, Qwen, and Agentica, leveraging their models and codebases.

Maintenance & Community

  • The project acknowledges contributions and codebases from DeepSeek, Qwen, and Agentica.
  • No specific community links (Discord, Slack) or roadmap details are provided in the README.

Licensing & Compatibility

  • The license type is not explicitly stated in the provided README text.

Limitations & Caveats

The README does not detail specific limitations, known bugs, or alpha/beta status. The reliance on specific versions of dependencies like verl might require careful management during setup and use.

Health Check
Last Commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.