Discover and explore top open-source AI tools and projects—updated daily.
nick7nlpOn-Policy Distillation for Large Language Models
Top 91.8% on SourcePulse
This repository serves as a comprehensive, curated collection of resources on On-Policy Distillation (OPD) for Large Language Models (LLMs). It addresses the critical challenge of compounding errors in LLM reasoning, offering a structured pathway for researchers, engineers, and power users to understand, implement, and advance this vital post-training paradigm. The primary benefit is a centralized, up-to-date knowledge base that accelerates adoption and innovation in developing more robust and capable LLMs.
How It Works
On-Policy Distillation (OPD) tackles the exposure bias inherent in traditional off-policy methods like Supervised Fine-Tuning (SFT). Instead of learning from static teacher demonstrations, OPD requires the student model to generate its own data trajectories. These self-generated trajectories are then evaluated by a teacher model, reward model, or verifier, providing a dense, token-level supervision signal. This approach allows the student to learn from its own mistakes within its own generative distribution, proving indispensable for scaling LLMs with complex reasoning capabilities.
Quick Start & Requirements
This repository is a curated resource hub, not a runnable codebase. The primary starting points are the comprehensive survey paper available on arXiv and the companion OPDHub website, which offers full-text search and multi-axis filtering of OPD methods. Recommended reading orders and specific paper suggestions are provided for users with different backgrounds or focusing on particular tasks like math reasoning or agent development.
Highlighted Details
Maintenance & Community
The repository is actively maintained, with recent updates noted in June 2026. Contributions are welcomed via Pull Requests and Issues, fostering a collaborative environment for expanding and refining the collection.
Licensing & Compatibility
The repository itself does not specify a license. Users should refer to the individual papers and resources linked within for their respective licensing terms and compatibility constraints, particularly concerning commercial use.
Limitations & Caveats
As a curated list of research papers and resources, this repository does not provide direct code for implementing OPD. Users must consult the cited papers for implementation details. The field of OPD is rapidly evolving, requiring continuous updates to the collection. The complexity of OPD methods necessitates a strong background in machine learning and LLM training.
6 days ago
Inactive
mlfoundations
thinking-machines-lab