distilling-step-by-step  by google-research

Code for research paper on knowledge distillation

created 2 years ago
554 stars

Top 58.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code for the paper "Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes." It enables users to train smaller language models to achieve performance comparable to or exceeding larger models, using less data and computational resources. The target audience includes researchers and practitioners in NLP and machine learning looking to optimize model efficiency and performance.

How It Works

The core approach involves a distillation technique that trains smaller models to mimic the reasoning process of larger models. This is achieved by generating intermediate "rationales" or step-by-step explanations from a larger LLM (like PaLM) and then using these rationales, along with the ground truth labels, to fine-tune a smaller T5 model. The alpha parameter controls the weighting between the rationale generation loss and the label prediction loss in multi-task training.

Quick Start & Requirements

  • Install: Conda environment setup with specific PyTorch (1.12.1), torchvision (0.13.1), and torchaudio (0.12.1) versions, and cudatoolkit=11.3. Install Python dependencies via pip, including a specific transformers version (v4.24.0).
  • Prerequisites: Python 3.10.6, Conda, PyTorch with CUDA 11.3, datasets library, sentencepiece, protobuf==3.20.*, tensorboardX. Unzip datasets.zip into the datasets/ directory.
  • Resources: Requires GPU with CUDA 11.3. Setup time involves environment creation and dependency installation.
  • Docs: Hugging Face Transformers

Highlighted Details

  • Supports fine-tuning with either ground truth labels (label_type gt) or LLM-predicted labels (label_type llm).
  • Enables multi-task training with an alpha parameter to balance label prediction and rationale generation losses.
  • Offers a task_prefix model type for the "distilling step-by-step" approach.
  • Compatible with various T5 model sizes (google/t5-v1_1-small to xxl) and datasets (esnli, anli1, cqa, svamp).

Maintenance & Community

The project is associated with Google Research. No specific community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

The repository itself is not explicitly licensed in the README. The code is likely subject to the Apache 2.0 license of the underlying Google Research projects, but this should be verified. Compatibility for commercial use depends on the licenses of the models and datasets used.

Limitations & Caveats

The setup requires specific, older versions of PyTorch and CUDA, which might pose compatibility challenges with newer hardware or software stacks. The project is presented as code for a specific paper, and its ongoing maintenance status is unclear.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
27 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.