DistillKit  by arcee-ai

Open-source toolkit for LLM distillation research

created 1 year ago
698 stars

Top 49.8% on sourcepulse

GitHubView on GitHub
Project Summary

DistillKit is an open-source research toolkit for Large Language Model (LLM) distillation, developed by Arcee.AI. It provides practical tools for researchers and developers to improve LLM performance and efficiency through distillation, targeting users who want to enhance smaller models using larger ones.

How It Works

DistillKit offers two primary distillation methods: Logit-based distillation, which uses KL divergence to match the output probability distributions of a teacher and student model (requiring identical architectures), and Hidden States-based distillation, which aligns intermediate layer representations, allowing for cross-architecture distillation and richer guidance.

Quick Start & Requirements

  • Installation: bash ./setup.sh or manual installation with pip install torch wheel ninja packaging flash-attn deepspeed -r requirements.txt.
  • Prerequisites: PyTorch, Flash Attention, DeepSpeed.
  • Usage: accelerate launch distil_logits.py.
  • Configuration: Settings are managed within the training script.
  • Docs: https://arcee.ai (for Arcee.AI platform, not specific DistillKit docs).

Highlighted Details

  • Supports Logit-based and Hidden States-based distillation.
  • Includes Supervised Fine-Tuning (SFT); DPO and CPT planned.
  • Integrates with Spectrum for potential speed improvements (experimental).
  • Achieves performance gains over standard SFT, especially for domain-specific tasks.

Maintenance & Community

  • Developed by Arcee.AI.
  • Community contributions are welcomed via issues and pull requests.
  • For technical questions, open an issue in the repository.

Licensing & Compatibility

  • The README does not explicitly state a license.

Limitations & Caveats

Memory requirements are higher than standard SFT. The project is actively working on scaling to support models larger than 70B parameters. Spectrum integration is noted as TBD for further evaluation.

Health Check
Last commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
0
Star History
113 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

HALOs by ContextualAI

0.3%
873
Library for aligning LLMs using human-aware loss functions
created 1 year ago
updated 3 weeks ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Travis Fischer Travis Fischer(Founder of Agentic).

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
created 9 months ago
updated 2 weeks ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 4 days ago
Feedback? Help us improve.