Discover and explore top open-source AI tools and projects—updated daily.
knovelengReinforcement learning for small LLM reasoning
Top 95.6% on SourcePulse
This repository provides code and datasets for enhancing reasoning in small LLMs (1.5B parameters) using reinforcement learning, targeting researchers and practitioners with resource constraints. It demonstrates significant improvements in mathematical reasoning benchmarks with a cost-effective fine-tuning approach.
How It Works
The project adapts the Group Relative Policy Optimization (GRPO) algorithm for fine-tuning small LLMs on a curated mathematical reasoning dataset. This approach aims to improve reasoning capabilities efficiently, achieving notable gains on benchmarks like AMC23 and AIME24 with a fraction of the data and cost of larger models.
Quick Start & Requirements
uv for environment management. Install dependencies including vllm (v0.7.2) and flash-attn (requires PyTorch v2.5.1).Highlighted Details
accelerate and lighteval.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project notes challenges with optimization instability and length constraints during extended training. Compatibility with PyTorch versions other than v2.5.1 may cause issues due to vLLM requirements.
2 weeks ago
Inactive
ByteDance-Seed
OFA-Sys
openreasoner