albert_zh  by brightmart

Chinese ALBERT model for self-supervised learning

created 5 years ago
3,979 stars

Top 12.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides implementations and pre-trained models for ALBERT, a lighter BERT variant optimized for self-supervised learning of language representations, particularly for Chinese. It offers significantly reduced parameter counts while maintaining competitive accuracy across various NLP tasks, making it suitable for resource-constrained environments or applications requiring faster inference.

How It Works

ALBERT's efficiency stems from three core architectural changes over BERT: factorized embedding parameterization, cross-layer parameter sharing, and a sentence-order prediction (SOP) loss that focuses on coherence rather than topic prediction. These modifications drastically reduce model size and computational requirements. The project also explores removing dropout for increased model capacity and utilizes the LAMB optimizer for large batch training.

Quick Start & Requirements

  • Install/Run: Clone the repository and run bash run_classifier_clue.sh for an end-to-end test.
  • Prerequisites: Python 3, TensorFlow 1.x (e.g., 1.4 or 1.5). GPU is recommended for training and fine-tuning.
  • Setup: Requires downloading pre-trained models and task-specific datasets.
  • Docs: CLUE benchmark, Tensorflow Lite guide

Highlighted Details

  • Offers multiple Chinese ALBERT models (tiny, small, base, large, xlarge, xxlarge) with varying parameter counts and performance characteristics.
  • albert_tiny_zh achieves 85.4% on LCQMC with 10x faster inference than BERT-base and ~60MB memory footprint when converted to TensorFlow Lite.
  • Supports fine-tuning on downstream tasks like sentence pair matching (LCQMC) and natural language inference (XNLI).
  • Includes scripts for pre-training custom models on new corpora.

Maintenance & Community

  • Active development with updates noted up to late 2019.
  • QQ group for technical discussion: 836811304.
  • Contact: brightmart@hotmail.com.

Licensing & Compatibility

  • The repository itself does not explicitly state a license. However, it references Google's ALBERT repository and uses TensorFlow, implying potential compatibility with their respective licenses. Users should verify licensing for commercial use.

Limitations & Caveats

  • Primarily focused on Chinese NLP tasks; English support for pre-training requires hyperparameter adjustments.
  • TensorFlow 1.x dependency may be a barrier for users on newer TensorFlow versions.
  • Some performance benchmarks and comparisons are marked as "to be added" or "will be updated soon."
Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), Abhishek Thakur Abhishek Thakur(World's First 4x Kaggle GrandMaster), and
5 more.

xlnet by zihangdai

0.0%
6k
Language model research paper using generalized autoregressive pretraining
created 6 years ago
updated 2 years ago
Feedback? Help us improve.