albert by google-research

ALBERT is a "lite" BERT research paper for self-supervised language representation learning

Created 6 years ago

3,276 stars

Top 14.6% on SourcePulse

View on GitHub

5 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Chaoyu Yang

Founder of Bento

Forrest Iandola

Author of SqueezeNet; Research Scientist at Meta

Luis Capelo

Cofounder of Lightning AI

and 1 more!

Project Summary

ALBERT is a "lite" version of BERT, offering parameter-reduction techniques for efficient language representation learning. It targets researchers and practitioners in NLP who need to deploy or fine-tune large language models with reduced memory footprints and improved performance.

How It Works

ALBERT employs parameter-reduction techniques, specifically cross-layer parameter sharing and factorized embedding parameterization. Cross-layer sharing significantly reduces the number of parameters, mitigating memory limitations and improving model degradation. Factorized embedding parameterization decomposes the large embedding matrix into two smaller matrices, further reducing parameters and improving performance.

Quick Start & Requirements

Install: pip install -r albert/requirements.txt
Pre-trained Models: Available via TF-Hub (e.g., https://tfhub.dev/google/albert_base/1).
Dependencies: TensorFlow 1.15+ (for TF-Hub models), SentencePiece.
Resources: Pre-training requires substantial computational resources. Fine-tuning can be done on standard GPU setups.
Docs: Colab tutorial for fine-tuning.

Highlighted Details

Achieves state-of-the-art results on GLUE benchmarks, outperforming BERT-large and XLNet-large in many tasks.
Offers multiple model sizes (Base, Large, Xlarge, Xxlarge) with varying parameter counts and performance trade-offs.
Version 2 models incorporate "no dropout," "additional training data," and "long training time" strategies for improved performance.
Provides specific fine-tuning scripts for GLUE, SQuAD (v1 & v2), and RACE datasets.

Maintenance & Community

Developed by Google Research.
Last updated March 2020.
No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

The repository itself is not explicitly licensed in the README. However, the models are distributed via TF-Hub, which typically uses Apache 2.0 or similar permissive licenses.
Compatibility for commercial use is likely, but should be verified against the specific TF-Hub model licenses.

Limitations & Caveats

The project's last update was in March 2020, indicating potential staleness.
TF-Hub models are noted to require TensorFlow 1.15, which is an older version.
Fine-tuning hyperparameters can be sensitive, as noted in the v2 model release regarding RACE performance.

Health Check

Last Commit

2 years ago

Responsiveness

1+ week

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days