Discover and explore top open-source AI tools and projects—updated daily.
Framework for off-policy learning in large reasoning models
Top 91.5% on SourcePulse
LUFFY is a reinforcement learning framework designed to enhance large reasoning models by integrating off-policy guidance. It targets researchers and developers working on improving the reasoning capabilities of LLMs, offering a method to leverage external reasoning traces for more effective training.
How It Works
LUFFY builds upon the GRPO framework, combining on-policy rollouts with off-policy demonstrations. It introduces a novel approach to advantage estimation by incorporating these external traces and employs policy shaping via regularized importance sampling. This allows LUFFY to dynamically balance imitation and exploration, emphasizing crucial but low-probability actions for better generalization.
Quick Start & Requirements
pip install -r requirements.txt
and pip install -e .
for the main package and verl
.Highlighted Details
Maintenance & Community
The project is actively maintained, with recent updates integrating more baseline implementations and re-evaluating models. Contact information for the authors is provided.
Licensing & Compatibility
The repository does not explicitly state a license. The project utilizes components from other open-source projects, and users should verify compatibility for commercial or closed-source use.
Limitations & Caveats
The project is described as having an "alpha" status. While comprehensive benchmarks are provided, specific hardware requirements for training and detailed performance metrics beyond benchmark scores are not extensively documented. The license is not specified, which may pose a barrier for some users.
1 month ago
Inactive