MiniRBT  by iflytek

Small, distilled Chinese pre-trained language models

created 2 years ago
285 stars

Top 92.8% on sourcepulse

GitHubView on GitHub
Project Summary

MiniRBT offers a suite of small, efficient Chinese pre-trained language models designed to address the challenges of large parameter counts and slow inference times in NLP. Targeting researchers and developers working with Chinese text, these models provide a practical solution for deployment on resource-constrained environments, leveraging knowledge distillation and whole-word masking techniques.

How It Works

MiniRBT models are built using a two-stage knowledge distillation process, where a teacher model first distills knowledge into an intermediate "teaching assistant" model, which then distills into the final student model. This approach, combined with whole-word masking (using LTP for segmentation) and a "narrow and deep" network architecture (6 layers with 256 or 288 hidden dimensions), aims to improve downstream task performance compared to traditional single-stage distillation and "wide and shallow" models.

Quick Start & Requirements

  • Installation: Load models via 🤗 Transformers: from transformers import BertTokenizer, BertModel. Use model names like "hfl/minirbt-h256".
  • Prerequisites: Python, PyTorch. For pre-training: Python 3.8, PyTorch v1.8.1, requirements.txt. Requires downloading base model weights (e.g., RoBERTa-wwm-ext) and LTP segmentation models.
  • Resources: Pre-training requires significant data preprocessing and computational resources. Model loading is standard.
  • Links: Hugging Face Models, TextBrewer

Highlighted Details

  • Offers MiniRBT-h256 (10.4M params) and MiniRBT-h288 (12.3M params) models, both 6-layer Transformers.
  • Includes RBT4-h312 (11.4M params), a 4-layer TinyBERT-sized model for comparison.
  • Demonstrates competitive performance on various Chinese NLP tasks (CMRC 2018, DRCD, OCNLI, LCQMC, BQ Corpus, TNEWS, ChnSentiCorp), with MiniRBT-h288 showing strength in reading comprehension.
  • Provides pre-training code using the TextBrewer toolkit, including scripts for data preparation and distillation.

Maintenance & Community

  • Developed by the HFL (Harbin Institute of Technology - iFlytek Joint Lab).
  • Issues can be submitted via GitHub Issues.

Licensing & Compatibility

  • The README does not explicitly state a license. Models are available via Hugging Face, implying compatibility with its terms.

Limitations & Caveats

  • The README does not specify a license, which may impact commercial use.
  • While pre-trained, further fine-tuning with knowledge distillation is suggested for optimal downstream task performance.
  • Some datasets used for evaluation are not provided directly.
Health Check
Last commit

2 weeks ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.