distillm by jongwooko

Streamlined LLM distillation for efficient model training

Created 2 years ago

267 stars

Top 95.7% on SourcePulse

View on GitHub

2 Experts Love This Project

Wing Lian

Founder of Axolotl AI

Casper Hansen

Author of AutoAWQ

Project Summary

Summary

DistiLLM offers an official PyTorch implementation for streamlined knowledge distillation of Large Language Models (LLMs), presented at ICML 2024. It targets researchers and practitioners seeking efficient LLM distillation, providing improved generation performance and training speed over existing baselines.

How It Works

This project implements and evaluates various knowledge distillation techniques, including SFT, KD, SeqKD, ImitKD, MiniLLM, GKD, and its novel DistiLLM approach. Built on HuggingFace Transformers, it standardizes data processing, model training, and evaluation across LLM families like GPT-2, OPT, and OpenLLaMA. Its advantage lies in its comparative framework and streamlined distillation strategy for efficient knowledge transfer.

Quick Start & Requirements

Primary install / run command: Execute bash install.sh.
Non-default prerequisites and dependencies: PyTorch, HuggingFace Transformers. Requires significant data preprocessing (e.g., tools/get_openwebtext.py, scripts/gpt2/tools/process_data_dolly.sh). Base pre-trained models must be downloaded from Huggingface Model Hub or specified via the CKPT variable.
Hardware: Experiments were conducted on 4 * 40A100 GPUs, indicating substantial GPU resources are recommended for training.
Links: Direct URLs for the paper, data resources, or DistiLLM-2 code are not provided within the documentation.

Highlighted Details

Achieves superior generation performance and training speed over KD baselines for GPT-2, OPT, and OpenLLaMA families.
Provides LoRA checkpoints for OpenLLaMA-3B.
Supports multiple LLM architectures and distillation methods for comparative analysis.
Preliminary code for DistiLLM-2 is available, with final code expected soon.

Maintenance & Community

Recent activity in early-mid 2024. Direct contact via email provided. No explicit community channels or roadmap links are present.

Licensing & Compatibility

No software license is specified, creating ambiguity regarding usage rights and commercial compatibility.

Limitations & Caveats

No explicit limitations, bugs, or alpha status are detailed. Setup requires considerable data preparation and model downloading. Reproducing results necessitates high-end GPU infrastructure. DistiLLM-2 code is preliminary and will be moved.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days