DistillKit by arcee-ai

Open-source toolkit for LLM distillation research

Created 1 year ago

819 stars

Top 43.3% on SourcePulse

View on GitHub

5 Experts Love This Project

Vincent Weisser

Cofounder of Prime Intellect

Pawel Garbacki

Cofounder of Fireworks AI

Johannes Hagemann

Cofounder of Prime Intellect

Maxime Labonne

Head of Post-Training at Liquid AI

and 1 more!

Project Summary

DistillKit is an open-source research toolkit for Large Language Model (LLM) distillation, developed by Arcee.AI. It provides practical tools for researchers and developers to improve LLM performance and efficiency through distillation, targeting users who want to enhance smaller models using larger ones.

How It Works

DistillKit offers two primary distillation methods: Logit-based distillation, which uses KL divergence to match the output probability distributions of a teacher and student model (requiring identical architectures), and Hidden States-based distillation, which aligns intermediate layer representations, allowing for cross-architecture distillation and richer guidance.

Quick Start & Requirements

Installation: bash ./setup.sh or manual installation with pip install torch wheel ninja packaging flash-attn deepspeed -r requirements.txt.
Prerequisites: PyTorch, Flash Attention, DeepSpeed.
Usage: accelerate launch distil_logits.py.
Configuration: Settings are managed within the training script.
Docs: https://arcee.ai (for Arcee.AI platform, not specific DistillKit docs).

Highlighted Details

Supports Logit-based and Hidden States-based distillation.
Includes Supervised Fine-Tuning (SFT); DPO and CPT planned.
Integrates with Spectrum for potential speed improvements (experimental).
Achieves performance gains over standard SFT, especially for domain-specific tasks.

Maintenance & Community

Developed by Arcee.AI.
Community contributions are welcomed via issues and pull requests.
For technical questions, open an issue in the repository.

Licensing & Compatibility

The README does not explicitly state a license.

Limitations & Caveats

Memory requirements are higher than standard SFT. The project is actively working on scaling to support models larger than 70B parameters. Spectrum integration is noted as TBD for further evaluation.

Health Check

Last Commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

39 stars in the last 30 days