AwesomeOPD by thinkwee

Awesome list for On-Policy Distillation in LLM training

Created 2 months ago

736 stars

Top 46.2% on SourcePulse

Project Summary

Summary

AwesomeOPD curates open-source repositories and papers on On-Policy Distillation (OPD) and On-Policy Self-Distillation (OPSD) for training LLMs, VLMs, and agents. It offers researchers and engineers a structured overview and detailed annotations, significantly aiding evaluation and adoption decisions in this complex training paradigm.

How It Works

OPD trains a student model by having it sample its own trajectories (y ~ π_student(·|x)) and then supervising these samples with a teacher model, typically via per-token logits. OPSD is a variant where the teacher is the same model, conditioned differently (e.g., privileged context). Entries are annotated across four axes: teacher source, supervision signal, rollout consumption, and pipeline slot, enabling nuanced comparison.

Quick Start & Requirements

This is an "awesome list" of research and projects, not a single installable framework. It provides no direct installation or execution commands. Users must refer to individual linked papers and repositories for specific implementation details, requirements, and setup.

Highlighted Details

Features a comprehensive taxonomy (Surveys, White-Box, Black-Box, OPSD, OPD-RL Hybrids, etc.).
Each entry is annotated with teacher source, supervision signal, rollout consumption, and pipeline slot.
Curation uses LLM agents and manual review, with a disclaimer for potential errors.
Last updated April 30, 2026.

Maintenance & Community

Maintained by "AwesomeOPD Contributors" with an open invitation for Pull Requests (PRs). The GitHub repository is the primary community hub.

Licensing & Compatibility

The README does not specify a license for the list. Commercial use or closed-source compatibility depends on the licenses of individual referenced projects.

Limitations & Caveats

The curation process acknowledges "errors are possible." Some entries may be borderline or require deeper analysis to confirm strict OPD adherence (student sampling + teacher supervision). The list intentionally excludes related methods like pure RL or offline distillation.

AwesomeOPD by thinkwee

Explore Similar Projects

tessera by zengxiao-he

Awesome-LLM-On-Policy-Distillation by nick7nlp

awesome-on-policy-distillation by chrisliu298

G-OPD by RUCBM

distill-sd by segmind

OPD by thunlp

Awesome-Knowledge-Distillation-of-LLMs by Tebmer

knowledge-distillation-papers by lhyfst

mdistiller by megvii-research

Awesome-Dataset-Distillation by Guang000

tinker-cookbook by thinking-machines-lab

awesome-knowledge-distillation by dkozlov