Code for research paper on knowledge distillation
Top 58.7% on sourcepulse
This repository provides code for the paper "Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes." It enables users to train smaller language models to achieve performance comparable to or exceeding larger models, using less data and computational resources. The target audience includes researchers and practitioners in NLP and machine learning looking to optimize model efficiency and performance.
How It Works
The core approach involves a distillation technique that trains smaller models to mimic the reasoning process of larger models. This is achieved by generating intermediate "rationales" or step-by-step explanations from a larger LLM (like PaLM) and then using these rationales, along with the ground truth labels, to fine-tune a smaller T5 model. The alpha
parameter controls the weighting between the rationale generation loss and the label prediction loss in multi-task training.
Quick Start & Requirements
cudatoolkit=11.3
. Install Python dependencies via pip, including a specific transformers
version (v4.24.0
).datasets
library, sentencepiece
, protobuf==3.20.*
, tensorboardX
. Unzip datasets.zip
into the datasets/
directory.Highlighted Details
label_type gt
) or LLM-predicted labels (label_type llm
).alpha
parameter to balance label prediction and rationale generation losses.task_prefix
model type for the "distilling step-by-step" approach.google/t5-v1_1-small
to xxl
) and datasets (esnli
, anli1
, cqa
, svamp
).Maintenance & Community
The project is associated with Google Research. No specific community links (Discord/Slack) or roadmap are provided in the README.
Licensing & Compatibility
The repository itself is not explicitly licensed in the README. The code is likely subject to the Apache 2.0 license of the underlying Google Research projects, but this should be verified. Compatibility for commercial use depends on the licenses of the models and datasets used.
Limitations & Caveats
The setup requires specific, older versions of PyTorch and CUDA, which might pose compatibility challenges with newer hardware or software stacks. The project is presented as code for a specific paper, and its ongoing maintenance status is unclear.
1 year ago
Inactive