distillm  by jongwooko

Streamlined LLM distillation for efficient model training

Created 2 years ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

DistiLLM offers an official PyTorch implementation for streamlined knowledge distillation of Large Language Models (LLMs), presented at ICML 2024. It targets researchers and practitioners seeking efficient LLM distillation, providing improved generation performance and training speed over existing baselines.

How It Works

This project implements and evaluates various knowledge distillation techniques, including SFT, KD, SeqKD, ImitKD, MiniLLM, GKD, and its novel DistiLLM approach. Built on HuggingFace Transformers, it standardizes data processing, model training, and evaluation across LLM families like GPT-2, OPT, and OpenLLaMA. Its advantage lies in its comparative framework and streamlined distillation strategy for efficient knowledge transfer.

Quick Start & Requirements

  • Primary install / run command: Execute bash install.sh.
  • Non-default prerequisites and dependencies: PyTorch, HuggingFace Transformers. Requires significant data preprocessing (e.g., tools/get_openwebtext.py, scripts/gpt2/tools/process_data_dolly.sh). Base pre-trained models must be downloaded from Huggingface Model Hub or specified via the CKPT variable.
  • Hardware: Experiments were conducted on 4 * 40A100 GPUs, indicating substantial GPU resources are recommended for training.
  • Links: Direct URLs for the paper, data resources, or DistiLLM-2 code are not provided within the documentation.

Highlighted Details

  • Achieves superior generation performance and training speed over KD baselines for GPT-2, OPT, and OpenLLaMA families.
  • Provides LoRA checkpoints for OpenLLaMA-3B.
  • Supports multiple LLM architectures and distillation methods for comparative analysis.
  • Preliminary code for DistiLLM-2 is available, with final code expected soon.

Maintenance & Community

Recent activity in early-mid 2024. Direct contact via email provided. No explicit community channels or roadmap links are present.

Licensing & Compatibility

No software license is specified, creating ambiguity regarding usage rights and commercial compatibility.

Limitations & Caveats

No explicit limitations, bugs, or alpha status are detailed. Setup requires considerable data preparation and model downloading. Reproducing results necessitates high-end GPU infrastructure. DistiLLM-2 code is preliminary and will be moved.

Health Check
Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
1 more.

awesome-knowledge-distillation by dkozlov

0.1%
4k
Collection of knowledge distillation resources
Created 9 years ago
Updated 2 months ago
Feedback? Help us improve.