LightReasoner  by HKUDS

LLM reasoning enhancement via SLM-LLM knowledge transfer

Created 1 month ago
455 stars

Top 66.3% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> LightReasoner addresses Supervised Fine-Tuning (SFT) inefficiencies for LLMs by enabling smaller models to teach LLMs reasoning. Targeting researchers and practitioners, it offers superior reasoning accuracy with drastically reduced computational overhead, making advanced AI training more accessible.

How It Works

The project introduces an 'SLM-LLM Teaching' paradigm where smaller models identify critical reasoning steps for LLMs. Its three-stage framework involves: (1) selecting informative steps via Expert-Amateur KL divergence, (2) generating contrastive supervision signals from behavioral differentials, and (3) self-distilling expert strengths. This approach prioritizes strategic token optimization over exhaustive training, achieving extreme token efficiency and verification-free learning without ground-truth labels.

Quick Start & Requirements

Installation: clone repo (git clone https://github.com/HKUDS/LightReasoner.git), cd LightReasoner, pip install -r requirements.txt. Python 3.10+ required. Download Expert/Amateur models from Hugging Face (e.g., Qwen2.5-Math-1.5B). Process: data prep (data_prep.py), sampling (LightR_sampling.py), fine-tuning (LightR_finetuning.py). Pre-collected datasets (LRsamples) bypass sampling. Key resources: paper arXiv:2510.07962, Hugging Face models.

Highlighted Details

  • Achieves consistent zero-shot pass@1 accuracy improvements across 7 benchmarks/5 models (e.g., +28.1% GSM8K for Qwen2.5-Math-1.5B).
  • Demonstrates remarkable efficiency gains over SFT: 90% less total time, 80% fewer sampled problems, 99% fewer tuned tokens.
  • Exhibits strong generalization, improving performance across benchmarks even when trained solely on GSM8K.
  • Leverages domain expertise gaps over model size for effective Expert-Amateur collaboration.

Maintenance & Community

Actively releasing new components as of Oct 2025. README lacks specific community channels (Discord/Slack), roadmap, or contributor/sponsorship details.

Licensing & Compatibility

Permissive MIT License, compatible with commercial use and closed-source linking.

Limitations & Caveats

Method success critically depends on Expert-Amateur model pairing; a balanced 'sweet spot' is crucial, not just a wide capability gap. Performance gains diminish as Amateur approaches Expert capability. Adaptability to new datasets may require hyperparameter/Amateur model adjustments. Sampling results show minor variations across Torch versions.

Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
5
Star History
469 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.