Discover and explore top open-source AI tools and projects—updated daily.
HKUDSLLM reasoning enhancement via SLM-LLM knowledge transfer
Top 66.3% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> LightReasoner addresses Supervised Fine-Tuning (SFT) inefficiencies for LLMs by enabling smaller models to teach LLMs reasoning. Targeting researchers and practitioners, it offers superior reasoning accuracy with drastically reduced computational overhead, making advanced AI training more accessible.
How It Works
The project introduces an 'SLM-LLM Teaching' paradigm where smaller models identify critical reasoning steps for LLMs. Its three-stage framework involves: (1) selecting informative steps via Expert-Amateur KL divergence, (2) generating contrastive supervision signals from behavioral differentials, and (3) self-distilling expert strengths. This approach prioritizes strategic token optimization over exhaustive training, achieving extreme token efficiency and verification-free learning without ground-truth labels.
Quick Start & Requirements
Installation: clone repo (git clone https://github.com/HKUDS/LightReasoner.git), cd LightReasoner, pip install -r requirements.txt. Python 3.10+ required. Download Expert/Amateur models from Hugging Face (e.g., Qwen2.5-Math-1.5B). Process: data prep (data_prep.py), sampling (LightR_sampling.py), fine-tuning (LightR_finetuning.py). Pre-collected datasets (LRsamples) bypass sampling. Key resources: paper arXiv:2510.07962, Hugging Face models.
Highlighted Details
Maintenance & Community
Actively releasing new components as of Oct 2025. README lacks specific community channels (Discord/Slack), roadmap, or contributor/sponsorship details.
Licensing & Compatibility
Permissive MIT License, compatible with commercial use and closed-source linking.
Limitations & Caveats
Method success critically depends on Expert-Amateur model pairing; a balanced 'sweet spot' is crucial, not just a wide capability gap. Performance gains diminish as Amateur approaches Expert capability. Adaptability to new datasets may require hyperparameter/Amateur model adjustments. Sampling results show minor variations across Torch versions.
3 days ago
Inactive