TinyBert  by Lisennlp

TinyBERT: Distilled pre-trained language model based on BERT

Created 5 years ago
267 stars

Top 95.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a simplified implementation of TinyBERT, a knowledge distillation framework for pre-trained language models. It aims to make the distillation process more accessible for users to train their own distilled models using custom datasets.

How It Works

TinyBERT employs a multi-stage knowledge distillation process. It first distills a general-purpose student BERT model from a teacher BERT. Subsequently, it fine-tunes the teacher BERT on task-specific data and then distills this fine-tuned teacher into a task-specific student model. This involves minimizing losses across word embeddings, hidden layers, and attention mechanisms, with a final stage focusing on task prediction labels. Data augmentation techniques are also incorporated to improve performance.

Quick Start & Requirements

  • General Distillation: sh script/general_train.sh
  • Task Distillation (Stage 1): sh script/task_train.sh one
  • Task Distillation (Stage 2): sh script/task_train.sh two
  • Prerequisites: PyTorch, Hugging Face Transformers, GloVe embeddings (for data augmentation). Requires access to pre-trained BERT models.
  • Data Format: data/train.txt, data/eval.txt.
  • Data Augmentation: python data_augmentation.py with parameters for pre-trained BERT model path, data path, GloVe path, and augmentation settings.

Highlighted Details

  • Simplified data loading for custom datasets.
  • Multi-stage distillation process for general and task-specific models.
  • Data augmentation strategy using BERT masking and GloVe similarity.
  • Provides pre-distilled models for GLUE tasks.

Maintenance & Community

The project is based on Huawei's TinyBert. No specific community links or active maintenance signals are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project primarily uses English data and GloVe embeddings, requiring manual adaptation for Chinese or other languages by changing pre-trained models and embedding files. Evaluation details are marked as "To be continued."

Health Check
Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Forrest Iandola Forrest Iandola(Author of SqueezeNet; Research Scientist at Meta), Chris Van Pelt Chris Van Pelt(Cofounder of Weights & Biases), and
2 more.

mt-dnn by namisan

0%
2k
PyTorch package for multi-task deep neural networks research
Created 6 years ago
Updated 1 year ago
Feedback? Help us improve.