Discover and explore top open-source AI tools and projects—updated daily.
jongwookoStreamlined LLM distillation for efficient model training
Top 99.6% on SourcePulse
Summary
DistiLLM offers an official PyTorch implementation for streamlined knowledge distillation of Large Language Models (LLMs), presented at ICML 2024. It targets researchers and practitioners seeking efficient LLM distillation, providing improved generation performance and training speed over existing baselines.
How It Works
This project implements and evaluates various knowledge distillation techniques, including SFT, KD, SeqKD, ImitKD, MiniLLM, GKD, and its novel DistiLLM approach. Built on HuggingFace Transformers, it standardizes data processing, model training, and evaluation across LLM families like GPT-2, OPT, and OpenLLaMA. Its advantage lies in its comparative framework and streamlined distillation strategy for efficient knowledge transfer.
Quick Start & Requirements
bash install.sh.tools/get_openwebtext.py, scripts/gpt2/tools/process_data_dolly.sh). Base pre-trained models must be downloaded from Huggingface Model Hub or specified via the CKPT variable.4 * 40A100 GPUs, indicating substantial GPU resources are recommended for training.Highlighted Details
Maintenance & Community
Recent activity in early-mid 2024. Direct contact via email provided. No explicit community channels or roadmap links are present.
Licensing & Compatibility
No software license is specified, creating ambiguity regarding usage rights and commercial compatibility.
Limitations & Caveats
No explicit limitations, bugs, or alpha status are detailed. Setup requires considerable data preparation and model downloading. Reproducing results necessitates high-end GPU infrastructure. DistiLLM-2 code is preliminary and will be moved.
11 months ago
Inactive
NVlabs
segmind
google-research
arcee-ai
dkozlov