Discover and explore top open-source AI tools and projects—updated daily.
idanshenContinual learning for foundation models via self-distillation
New!
Top 80.6% on SourcePulse
This repository provides TRL-based code for reproducing the On-Policy Self-Distillation Fine-Tuning (SDFT) algorithm, a method designed to enable continual learning in foundation models. SDFT allows models to acquire new skills and knowledge from demonstrations without degrading existing capabilities, offering a practical solution when explicit reward functions are unavailable. It benefits researchers and engineers seeking to incrementally enhance foundation models while mitigating catastrophic forgetting.
How It Works
This repository implements On-Policy Self-Distillation Fine-Tuning (SDFT) using the TRL library. SDFT addresses continual learning challenges by enabling models to acquire new skills from demonstrations without degrading prior capabilities. It leverages in-context learning, using a demonstration-conditioned model as its own teacher to generate on-policy training signals. This approach circumvents the limitations of off-policy Supervised Fine-Tuning (SFT), consistently outperforming SFT in skill acquisition and knowledge tasks while substantially reducing catastrophic forgetting.
Quick Start & Requirements
pip install -r requirements.txt.python main.py --model_name <path_to_model> --output_dir <output_path>.Qwen/Qwen2.5-7B-Instruct.Highlighted Details
Maintenance & Community
The provided README does not contain information regarding notable contributors, sponsorships, community channels (e.g., Discord, Slack), or a public roadmap.
Licensing & Compatibility
The README does not specify the software license. Therefore, compatibility for commercial use or closed-source linking cannot be determined from the provided text.
Limitations & Caveats
The implementation is primarily geared towards reproducing research findings on specific hardware (single H200 GPU), and alternative setups may require significant refactoring or changes to model sizes. The repository's focus appears to be on experimental reproduction rather than a general-purpose library.
5 days ago
Inactive