MiniRBT by iflytek

Small, distilled Chinese pre-trained language models

Created 3 years ago

298 stars

Top 89.2% on SourcePulse

Project Summary

MiniRBT offers a suite of small, efficient Chinese pre-trained language models designed to address the challenges of large parameter counts and slow inference times in NLP. Targeting researchers and developers working with Chinese text, these models provide a practical solution for deployment on resource-constrained environments, leveraging knowledge distillation and whole-word masking techniques.

How It Works

MiniRBT models are built using a two-stage knowledge distillation process, where a teacher model first distills knowledge into an intermediate "teaching assistant" model, which then distills into the final student model. This approach, combined with whole-word masking (using LTP for segmentation) and a "narrow and deep" network architecture (6 layers with 256 or 288 hidden dimensions), aims to improve downstream task performance compared to traditional single-stage distillation and "wide and shallow" models.

Quick Start & Requirements

Installation: Load models via 🤗 Transformers: from transformers import BertTokenizer, BertModel. Use model names like "hfl/minirbt-h256".
Prerequisites: Python, PyTorch. For pre-training: Python 3.8, PyTorch v1.8.1, requirements.txt. Requires downloading base model weights (e.g., RoBERTa-wwm-ext) and LTP segmentation models.
Resources: Pre-training requires significant data preprocessing and computational resources. Model loading is standard.
Links: Hugging Face Models, TextBrewer

Highlighted Details

Offers MiniRBT-h256 (10.4M params) and MiniRBT-h288 (12.3M params) models, both 6-layer Transformers.
Includes RBT4-h312 (11.4M params), a 4-layer TinyBERT-sized model for comparison.
Demonstrates competitive performance on various Chinese NLP tasks (CMRC 2018, DRCD, OCNLI, LCQMC, BQ Corpus, TNEWS, ChnSentiCorp), with MiniRBT-h288 showing strength in reading comprehension.
Provides pre-training code using the TextBrewer toolkit, including scripts for data preparation and distillation.

Maintenance & Community

Developed by the HFL (Harbin Institute of Technology - iFlytek Joint Lab).
Issues can be submitted via GitHub Issues.

Licensing & Compatibility

The README does not explicitly state a license. Models are available via Hugging Face, implying compatibility with its terms.

Limitations & Caveats

The README does not specify a license, which may impact commercial use.
While pre-trained, further fine-tuning with knowledge distillation is suggested for optimal downstream task performance.
Some datasets used for evaluation are not provided directly.

Health Check

Last Commit

6 months ago

Responsiveness

1 week

Pull Requests (30d)

0

Issues (30d)

0

Star History

2 stars in the last 30 days

Explore Similar Projects

PromptKD by zhengli97

Research paper for unsupervised prompt distillation in vision-language models

Created 1 year ago

Updated 4 weeks ago

kanana by kakao

Bilingual language models for Korean/English, compute-efficient vs. SOTA

Created 10 months ago

Updated 5 months ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera) and

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

Minitron by NVlabs

Compressed language models via pruning/distillation

Created 1 year ago

Updated 2 months ago

Starred by

Jianwei Yang

Jianwei Yang(Research Scientist at Meta Superintelligence Lab).

Optimus by ChunyuanLI

VAE language model for latent space sentence manipulation

Created 6 years ago

Updated 2 years ago

TinyBert by Lisennlp

TinyBERT: Distilled pre-trained language model based on BERT

Created 5 years ago

Updated 5 years ago

MacBERT by ymcui

Chinese NLP pre-trained language model research paper

Created 5 years ago

Updated 6 months ago

Starred by

Robert Stojnic

Robert Stojnic(Cocreator of Papers with Code).

finetune by IndicoDataSolutions

NLP finetuning library with scikit-learn style API

Created 7 years ago

Updated 2 months ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI) and

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm).

distilling-step-by-step by google-research

Code for research paper on knowledge distillation

Created 2 years ago

Updated 2 years ago

Pre-trained-Models by loujie0822

NLP pre-trained model overview

Created 6 years ago

Updated 5 years ago

Starred by

Evan Hubinger

Evan Hubinger(Head of Alignment Stress-Testing at Anthropic).

bert_language_understanding by brightmart

Pre-training toolkit for language understanding tasks

Created 7 years ago

Updated 7 years ago

Starred by

Clement Delangue

Clement Delangue(Cofounder of Hugging Face),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

9 more.

setfit by huggingface

Few-shot learning framework for Sentence Transformers

Created 3 years ago

Updated 1 month ago

Starred by

Alexander Borzunov

Alexander Borzunov(Research Scientist at OpenAI),

Stas Bekman

Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and

2 more.

nlp_course by yandexdataschool

NLP course materials

Created 7 years ago

Updated 1 month ago

Feedback? Help us improve.