Discover and explore top open-source AI tools and projects—updated daily.
chrisliu298On-policy distillation techniques for LLM training and alignment
Top 86.7% on SourcePulse
This repository curates resources on On-Policy Distillation (OPD), a technique for training Large Language Models (LLMs) by having a student model learn from its own generated samples, guided by a teacher model. It addresses the train-inference distribution gap prevalent in off-policy distillation and supervised fine-tuning. Aimed at researchers and engineers, OPD offers a powerful post-training primitive adopted by major AI labs.
How It Works
OPD trains a student LLM using trajectories sampled from its own evolving policy, with a teacher model providing dense, token-level supervision. This on-policy data reduces the distribution mismatch between training and inference, contrasting with off-policy methods. It can be conceptualized as reinforcement learning with teacher-defined rewards or Generative Knowledge Distillation (GKD) on student rollouts.
Quick Start & Requirements
This is a curated collection, not a single installable project. Users should consult the "Frameworks and Implementations" section for tools like TRL, NeMo-RL, and KDFlow. Specific requirements depend on the chosen framework; links to official documentation are provided.
Highlighted Details
Maintenance & Community
This "Awesome" list repository curates research and resources, acknowledging parallel efforts and providing contribution guidelines. Direct community channels (e.g., Discord, Slack) are not explicitly listed.
Licensing & Compatibility
The collection itself lacks a specified license. Users must review the individual licenses of linked papers and frameworks for commercial use or closed-source compatibility.
Limitations & Caveats
As a curated list, this repository requires users to select and integrate specific frameworks or papers. The field is rapidly evolving, with many 2026 papers addressing known failure modes like instability or diversity collapse, necessitating careful evaluation of chosen OPD techniques.
1 day ago
Inactive
segmind
test-time-training
google
thinking-machines-lab