Small, distilled Chinese pre-trained language models
Top 92.8% on sourcepulse
MiniRBT offers a suite of small, efficient Chinese pre-trained language models designed to address the challenges of large parameter counts and slow inference times in NLP. Targeting researchers and developers working with Chinese text, these models provide a practical solution for deployment on resource-constrained environments, leveraging knowledge distillation and whole-word masking techniques.
How It Works
MiniRBT models are built using a two-stage knowledge distillation process, where a teacher model first distills knowledge into an intermediate "teaching assistant" model, which then distills into the final student model. This approach, combined with whole-word masking (using LTP for segmentation) and a "narrow and deep" network architecture (6 layers with 256 or 288 hidden dimensions), aims to improve downstream task performance compared to traditional single-stage distillation and "wide and shallow" models.
Quick Start & Requirements
from transformers import BertTokenizer, BertModel
. Use model names like "hfl/minirbt-h256"
.requirements.txt
. Requires downloading base model weights (e.g., RoBERTa-wwm-ext) and LTP segmentation models.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
2 weeks ago
1 week