Thinkless by VainF

LLM intelligently decides when to think

Created 9 months ago

254 stars

Top 99.1% on SourcePulse

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Thinkless is a learnable framework enabling LLMs to adaptively choose between concise and detailed reasoning based on task complexity and model confidence. It addresses the computational inefficiency of reasoning-intensive LLMs by reducing unnecessary long-form thinking, offering significant efficiency gains of 50%-90% on benchmarks. This project is targeted at researchers and engineers seeking to optimize LLM reasoning performance.

How It Works

The core innovation is a reinforcement learning paradigm employing two control tokens to switch between short and long reasoning modes. It utilizes a novel Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, which decomposes the learning objective into distinct control token and response accuracy losses. This approach stabilizes training and prevents common RL collapse issues, allowing for fine-grained control over reasoning mode selection.

Quick Start & Requirements

Setup involves creating a Conda environment with Python 3.10. Installation for training requires specific versions of PyTorch (2.4.0), lm_eval (0.4.8), ray (2.45.0), and nvidia-cublas-cu12. The project provides a Python snippet for quick inference using Hugging Face Transformers. Links to the paper (ArXiv), SFT code, RL model (Thinkless-1.5B-RL-DeepScaleR), and datasets are available.

Highlighted Details

Empirically, Thinkless reduces long-chain thinking by 50%-90% on benchmarks like Minerva Algebra, MATH-500, and GSM8K. It offers pre-trained 1.5B parameter models for RL and warmup phases. Evaluation scripts for LM-Eval and custom answer extraction are included, facilitating performance assessment.

Maintenance & Community

The project acknowledges contributions from agentica-project/rllm (DeepScaleR) and Megatron-LM. It utilizes datasets like DeepScaleR and OpenThoughts2-1M. No explicit community channels (Discord/Slack) or roadmap details are provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README, which requires clarification for adoption. Compatibility for commercial use or closed-source linking is therefore undetermined.

Limitations & Caveats

The TODO list indicates ongoing development for resume training, larger models (7B), and releasing warmup code. The current implementation may favor conciseness, potentially requiring hyperparameter tuning (e.g., correct_think_reward) for balanced performance. Specific CUDA versions and library dependencies can pose setup challenges.

Thinkless by VainF

Explore Similar Projects

Awesome-Long2short-on-LRMs by Hongcheng-Gao

awesome-llm-planning-reasoning by samkhur006

MEM1 by MIT-MI

GITM by OpenGVLab

l1 by cmu-l3

SwiftSage by SwiftSage

R-Zero by Chengsong-Huang

M_GRPO by baibizhe

rStar by zhentingqi

train-deepseek-r1 by FareedKhan-dev

Logic-RL by Unakar

GLM-4 by zai-org