albert_zh by brightmart

Chinese ALBERT model for self-supervised learning

Created 6 years ago

3,987 stars

Top 12.2% on SourcePulse

Project Summary

This repository provides implementations and pre-trained models for ALBERT, a lighter BERT variant optimized for self-supervised learning of language representations, particularly for Chinese. It offers significantly reduced parameter counts while maintaining competitive accuracy across various NLP tasks, making it suitable for resource-constrained environments or applications requiring faster inference.

How It Works

ALBERT's efficiency stems from three core architectural changes over BERT: factorized embedding parameterization, cross-layer parameter sharing, and a sentence-order prediction (SOP) loss that focuses on coherence rather than topic prediction. These modifications drastically reduce model size and computational requirements. The project also explores removing dropout for increased model capacity and utilizes the LAMB optimizer for large batch training.

Quick Start & Requirements

Install/Run: Clone the repository and run bash run_classifier_clue.sh for an end-to-end test.
Prerequisites: Python 3, TensorFlow 1.x (e.g., 1.4 or 1.5). GPU is recommended for training and fine-tuning.
Setup: Requires downloading pre-trained models and task-specific datasets.
Docs: CLUE benchmark, Tensorflow Lite guide

Highlighted Details

Offers multiple Chinese ALBERT models (tiny, small, base, large, xlarge, xxlarge) with varying parameter counts and performance characteristics.
albert_tiny_zh achieves 85.4% on LCQMC with 10x faster inference than BERT-base and ~60MB memory footprint when converted to TensorFlow Lite.
Supports fine-tuning on downstream tasks like sentence pair matching (LCQMC) and natural language inference (XNLI).
Includes scripts for pre-training custom models on new corpora.

Maintenance & Community

Active development with updates noted up to late 2019.
QQ group for technical discussion: 836811304.
Contact: brightmart@hotmail.com.

Licensing & Compatibility

The repository itself does not explicitly state a license. However, it references Google's ALBERT repository and uses TensorFlow, implying potential compatibility with their respective licenses. Users should verify licensing for commercial use.

Limitations & Caveats

Primarily focused on Chinese NLP tasks; English support for pre-training requires hyperparameter adjustments.
TensorFlow 1.x dependency may be a barrier for users on newer TensorFlow versions.
Some performance benchmarks and comparisons are marked as "to be added" or "will be updated soon."

Health Check

Last Commit

3 years ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days