Thinkless  by VainF

LLM intelligently decides when to think

Created 8 months ago
250 stars

Top 100.0% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Thinkless is a learnable framework enabling LLMs to adaptively choose between concise and detailed reasoning based on task complexity and model confidence. It addresses the computational inefficiency of reasoning-intensive LLMs by reducing unnecessary long-form thinking, offering significant efficiency gains of 50%-90% on benchmarks. This project is targeted at researchers and engineers seeking to optimize LLM reasoning performance.

How It Works

The core innovation is a reinforcement learning paradigm employing two control tokens to switch between short and long reasoning modes. It utilizes a novel Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, which decomposes the learning objective into distinct control token and response accuracy losses. This approach stabilizes training and prevents common RL collapse issues, allowing for fine-grained control over reasoning mode selection.

Quick Start & Requirements

Setup involves creating a Conda environment with Python 3.10. Installation for training requires specific versions of PyTorch (2.4.0), lm_eval (0.4.8), ray (2.45.0), and nvidia-cublas-cu12. The project provides a Python snippet for quick inference using Hugging Face Transformers. Links to the paper (ArXiv), SFT code, RL model (Thinkless-1.5B-RL-DeepScaleR), and datasets are available.

Highlighted Details

Empirically, Thinkless reduces long-chain thinking by 50%-90% on benchmarks like Minerva Algebra, MATH-500, and GSM8K. It offers pre-trained 1.5B parameter models for RL and warmup phases. Evaluation scripts for LM-Eval and custom answer extraction are included, facilitating performance assessment.

Maintenance & Community

The project acknowledges contributions from agentica-project/rllm (DeepScaleR) and Megatron-LM. It utilizes datasets like DeepScaleR and OpenThoughts2-1M. No explicit community channels (Discord/Slack) or roadmap details are provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README, which requires clarification for adoption. Compatibility for commercial use or closed-source linking is therefore undetermined.

Limitations & Caveats

The TODO list indicates ongoing development for resume training, larger models (7B), and releasing warmup code. The current implementation may favor conciseness, potentially requiring hyperparameter tuning (e.g., correct_think_reward) for balanced performance. Specific CUDA versions and library dependencies can pose setup challenges.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect).

GITM by OpenGVLab

0.2%
638
LLM agent for Minecraft open-world environments
Created 2 years ago
Updated 2 years ago
Feedback? Help us improve.