Discover and explore top open-source AI tools and projects—updated daily.
thinkweeAwesome list for On-Policy Distillation in LLM training
New!
Top 64.9% on SourcePulse
Summary
AwesomeOPD curates open-source repositories and papers on On-Policy Distillation (OPD) and On-Policy Self-Distillation (OPSD) for training LLMs, VLMs, and agents. It offers researchers and engineers a structured overview and detailed annotations, significantly aiding evaluation and adoption decisions in this complex training paradigm.
How It Works
OPD trains a student model by having it sample its own trajectories (y ~ π_student(·|x)) and then supervising these samples with a teacher model, typically via per-token logits. OPSD is a variant where the teacher is the same model, conditioned differently (e.g., privileged context). Entries are annotated across four axes: teacher source, supervision signal, rollout consumption, and pipeline slot, enabling nuanced comparison.
Quick Start & Requirements
This is an "awesome list" of research and projects, not a single installable framework. It provides no direct installation or execution commands. Users must refer to individual linked papers and repositories for specific implementation details, requirements, and setup.
Highlighted Details
Maintenance & Community
Maintained by "AwesomeOPD Contributors" with an open invitation for Pull Requests (PRs). The GitHub repository is the primary community hub.
Licensing & Compatibility
The README does not specify a license for the list. Commercial use or closed-source compatibility depends on the licenses of individual referenced projects.
Limitations & Caveats
The curation process acknowledges "errors are possible." Some entries may be borderline or require deeper analysis to confirm strict OPD adherence (student sampling + teacher supervision). The list intentionally excludes related methods like pure RL or offline distillation.
2 days ago
Inactive
segmind
arcee-ai
google
thinking-machines-lab
dkozlov