llm_distillation_playbook by predibase

LLM distillation guide for production applications

Created 2 years ago

599 stars

Top 54.5% on SourcePulse

View on GitHub

2 Experts Love This Project

Eugene Yan

AI Scientist at AWS

Travis Addair

Cofounder of Predibase

Project Summary

This document provides a comprehensive playbook for distilling large language models (LLMs) into smaller, more efficient student models for production applications. It targets engineers and ML practitioners, offering practical, research-backed strategies to overcome the common challenges and guesswork involved in LLM distillation, ultimately balancing capability with cost-effectiveness and speed.

How It Works

The playbook outlines a systematic approach to distillation, emphasizing data quality, teacher model optimization, and rigorous evaluation. Key principles include understanding smaller model limitations, building robust logging, defining clear evaluation criteria (including balanced and in-distribution test sets), maximizing teacher quality through prompt engineering, and iterating on training data quality. It advocates for starting with simpler configurations and gradually increasing complexity, while also considering deployment strategies like LoRAX for efficient serving.

Quick Start & Requirements

This is a documentation repository, not a runnable codebase. The core concepts and best practices are illustrated using the Jigsaw toxic comment classification dataset.

Highlighted Details

Detailed best practices cover data bootstrapping (real logs, synthetic data), evaluation metrics, teacher/student model quality, dataset diversity, and experimental design.
Case studies using the Jigsaw dataset demonstrate the impact of data quality, prompt engineering, and dataset size on model performance.
Discusses deployment considerations, including parameter-efficient fine-tuning (PEFT) techniques like LoRA and serving solutions like LoRAX.
Emphasizes iterative experimentation, advising to change one parameter at a time and prioritize iteration speed.

Maintenance & Community

The project is maintained by Predibase, with contributions welcomed via GitHub issues, discussions, and pull requests. Community channels include Ludwig Slack and LoRAX Discord.

Licensing & Compatibility

The repository itself is not licensed as code. Predibase, the maintaining organization, is committed to open source and develops projects like Ludwig and LoRAX.

Limitations & Caveats

Distillation is presented as an empirical science, not guaranteed to work for all tasks, especially those requiring broad domain understanding or complex reasoning. The effectiveness is highly dependent on the specific task and data.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days