Discover and explore top open-source AI tools and projects—updated daily.
Reinforcement learning for small LLM reasoning
Top 97.5% on SourcePulse
This repository provides code and datasets for enhancing reasoning in small LLMs (1.5B parameters) using reinforcement learning, targeting researchers and practitioners with resource constraints. It demonstrates significant improvements in mathematical reasoning benchmarks with a cost-effective fine-tuning approach.
How It Works
The project adapts the Group Relative Policy Optimization (GRPO) algorithm for fine-tuning small LLMs on a curated mathematical reasoning dataset. This approach aims to improve reasoning capabilities efficiently, achieving notable gains on benchmarks like AMC23 and AIME24 with a fraction of the data and cost of larger models.
Quick Start & Requirements
uv
for environment management. Install dependencies including vllm
(v0.7.2) and flash-attn
(requires PyTorch v2.5.1).Highlighted Details
accelerate
and lighteval
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project notes challenges with optimization instability and length constraints during extended training. Compatibility with PyTorch versions other than v2.5.1 may cause issues due to vLLM
requirements.
4 months ago
Inactive