distilling-step-by-step  by google-research

Code for research paper on knowledge distillation

Created 2 years ago
557 stars

Top 57.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides code for the paper "Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes." It enables users to train smaller language models to achieve performance comparable to or exceeding larger models, using less data and computational resources. The target audience includes researchers and practitioners in NLP and machine learning looking to optimize model efficiency and performance.

How It Works

The core approach involves a distillation technique that trains smaller models to mimic the reasoning process of larger models. This is achieved by generating intermediate "rationales" or step-by-step explanations from a larger LLM (like PaLM) and then using these rationales, along with the ground truth labels, to fine-tune a smaller T5 model. The alpha parameter controls the weighting between the rationale generation loss and the label prediction loss in multi-task training.

Quick Start & Requirements

  • Install: Conda environment setup with specific PyTorch (1.12.1), torchvision (0.13.1), and torchaudio (0.12.1) versions, and cudatoolkit=11.3. Install Python dependencies via pip, including a specific transformers version (v4.24.0).
  • Prerequisites: Python 3.10.6, Conda, PyTorch with CUDA 11.3, datasets library, sentencepiece, protobuf==3.20.*, tensorboardX. Unzip datasets.zip into the datasets/ directory.
  • Resources: Requires GPU with CUDA 11.3. Setup time involves environment creation and dependency installation.
  • Docs: Hugging Face Transformers

Highlighted Details

  • Supports fine-tuning with either ground truth labels (label_type gt) or LLM-predicted labels (label_type llm).
  • Enables multi-task training with an alpha parameter to balance label prediction and rationale generation losses.
  • Offers a task_prefix model type for the "distilling step-by-step" approach.
  • Compatible with various T5 model sizes (google/t5-v1_1-small to xxl) and datasets (esnli, anli1, cqa, svamp).

Maintenance & Community

The project is associated with Google Research. No specific community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

The repository itself is not explicitly licensed in the README. The code is likely subject to the Apache 2.0 license of the underlying Google Research projects, but this should be verified. Compatibility for commercial use depends on the licenses of the models and datasets used.

Limitations & Caveats

The setup requires specific, older versions of PyTorch and CUDA, which might pose compatibility challenges with newer hardware or software stacks. The project is presented as code for a specific paper, and its ongoing maintenance status is unclear.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
1 more.

awesome-knowledge-distillation by dkozlov

0.1%
4k
Collection of knowledge distillation resources
Created 8 years ago
Updated 3 months ago
Feedback? Help us improve.