DistillKit  by arcee-ai

Open-source toolkit for LLM distillation research

Created 1 year ago
726 stars

Top 47.5% on SourcePulse

GitHubView on GitHub
Project Summary

DistillKit is an open-source research toolkit for Large Language Model (LLM) distillation, developed by Arcee.AI. It provides practical tools for researchers and developers to improve LLM performance and efficiency through distillation, targeting users who want to enhance smaller models using larger ones.

How It Works

DistillKit offers two primary distillation methods: Logit-based distillation, which uses KL divergence to match the output probability distributions of a teacher and student model (requiring identical architectures), and Hidden States-based distillation, which aligns intermediate layer representations, allowing for cross-architecture distillation and richer guidance.

Quick Start & Requirements

  • Installation: bash ./setup.sh or manual installation with pip install torch wheel ninja packaging flash-attn deepspeed -r requirements.txt.
  • Prerequisites: PyTorch, Flash Attention, DeepSpeed.
  • Usage: accelerate launch distil_logits.py.
  • Configuration: Settings are managed within the training script.
  • Docs: https://arcee.ai (for Arcee.AI platform, not specific DistillKit docs).

Highlighted Details

  • Supports Logit-based and Hidden States-based distillation.
  • Includes Supervised Fine-Tuning (SFT); DPO and CPT planned.
  • Integrates with Spectrum for potential speed improvements (experimental).
  • Achieves performance gains over standard SFT, especially for domain-specific tasks.

Maintenance & Community

  • Developed by Arcee.AI.
  • Community contributions are welcomed via issues and pull requests.
  • For technical questions, open an issue in the repository.

Licensing & Compatibility

  • The README does not explicitly state a license.

Limitations & Caveats

Memory requirements are higher than standard SFT. The project is actively working on scaling to support models larger than 70B parameters. Spectrum integration is noted as TBD for further evaluation.

Health Check
Last Commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
1 more.

awesome-knowledge-distillation by dkozlov

0.1%
4k
Collection of knowledge distillation resources
Created 8 years ago
Updated 3 months ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
15 more.

torchtune by pytorch

0.2%
5k
PyTorch library for LLM post-training and experimentation
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.