LightReasoner by HKUDS

LLM reasoning enhancement via SLM-LLM knowledge transfer

Created 3 months ago

581 stars

Top 55.8% on SourcePulse

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> LightReasoner addresses Supervised Fine-Tuning (SFT) inefficiencies for LLMs by enabling smaller models to teach LLMs reasoning. Targeting researchers and practitioners, it offers superior reasoning accuracy with drastically reduced computational overhead, making advanced AI training more accessible.

How It Works

The project introduces an 'SLM-LLM Teaching' paradigm where smaller models identify critical reasoning steps for LLMs. Its three-stage framework involves: (1) selecting informative steps via Expert-Amateur KL divergence, (2) generating contrastive supervision signals from behavioral differentials, and (3) self-distilling expert strengths. This approach prioritizes strategic token optimization over exhaustive training, achieving extreme token efficiency and verification-free learning without ground-truth labels.

Quick Start & Requirements

Installation: clone repo (git clone https://github.com/HKUDS/LightReasoner.git), cd LightReasoner, pip install -r requirements.txt. Python 3.10+ required. Download Expert/Amateur models from Hugging Face (e.g., Qwen2.5-Math-1.5B). Process: data prep (data_prep.py), sampling (LightR_sampling.py), fine-tuning (LightR_finetuning.py). Pre-collected datasets (LRsamples) bypass sampling. Key resources: paper arXiv:2510.07962, Hugging Face models.

Highlighted Details

Achieves consistent zero-shot pass@1 accuracy improvements across 7 benchmarks/5 models (e.g., +28.1% GSM8K for Qwen2.5-Math-1.5B).
Demonstrates remarkable efficiency gains over SFT: 90% less total time, 80% fewer sampled problems, 99% fewer tuned tokens.
Exhibits strong generalization, improving performance across benchmarks even when trained solely on GSM8K.
Leverages domain expertise gaps over model size for effective Expert-Amateur collaboration.

Maintenance & Community

Actively releasing new components as of Oct 2025. README lacks specific community channels (Discord/Slack), roadmap, or contributor/sponsorship details.

Licensing & Compatibility

Permissive MIT License, compatible with commercial use and closed-source linking.

Limitations & Caveats

Method success critically depends on Expert-Amateur model pairing; a balanced 'sweet spot' is crucial, not just a wide capability gap. Performance gains diminish as Amateur approaches Expert capability. Adaptability to new datasets may require hyperparameter/Amateur model adjustments. Sampling results show minor variations across Torch versions.

LightReasoner by HKUDS

Explore Similar Projects

CoT-Collection by kaistAI

Tool-Star by RUC-NLPIR

Vision-R1 by Osilly

XBai-o4 by MetaStone-AI

InternBootcamp by InternLM

Tina by shangshang-wang

reasoning-on-graphs by RManLuo

R-Zero by Chengsong-Huang

M_GRPO by baibizhe

MiMo by XiaomiMiMo

rStar by zhentingqi

train-deepseek-r1 by FareedKhan-dev