albert  by google-research

ALBERT is a "lite" BERT research paper for self-supervised language representation learning

created 5 years ago
3,273 stars

Top 15.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

ALBERT is a "lite" version of BERT, offering parameter-reduction techniques for efficient language representation learning. It targets researchers and practitioners in NLP who need to deploy or fine-tune large language models with reduced memory footprints and improved performance.

How It Works

ALBERT employs parameter-reduction techniques, specifically cross-layer parameter sharing and factorized embedding parameterization. Cross-layer sharing significantly reduces the number of parameters, mitigating memory limitations and improving model degradation. Factorized embedding parameterization decomposes the large embedding matrix into two smaller matrices, further reducing parameters and improving performance.

Quick Start & Requirements

  • Install: pip install -r albert/requirements.txt
  • Pre-trained Models: Available via TF-Hub (e.g., https://tfhub.dev/google/albert_base/1).
  • Dependencies: TensorFlow 1.15+ (for TF-Hub models), SentencePiece.
  • Resources: Pre-training requires substantial computational resources. Fine-tuning can be done on standard GPU setups.
  • Docs: Colab tutorial for fine-tuning.

Highlighted Details

  • Achieves state-of-the-art results on GLUE benchmarks, outperforming BERT-large and XLNet-large in many tasks.
  • Offers multiple model sizes (Base, Large, Xlarge, Xxlarge) with varying parameter counts and performance trade-offs.
  • Version 2 models incorporate "no dropout," "additional training data," and "long training time" strategies for improved performance.
  • Provides specific fine-tuning scripts for GLUE, SQuAD (v1 & v2), and RACE datasets.

Maintenance & Community

  • Developed by Google Research.
  • Last updated March 2020.
  • No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

  • The repository itself is not explicitly licensed in the README. However, the models are distributed via TF-Hub, which typically uses Apache 2.0 or similar permissive licenses.
  • Compatibility for commercial use is likely, but should be verified against the specific TF-Hub model licenses.

Limitations & Caveats

  • The project's last update was in March 2020, indicating potential staleness.
  • TF-Hub models are noted to require TensorFlow 1.15, which is an older version.
  • Fine-tuning hyperparameters can be sensitive, as noted in the v2 model release regarding RACE performance.
Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), Abhishek Thakur Abhishek Thakur(World's First 4x Kaggle GrandMaster), and
5 more.

xlnet by zihangdai

0.0%
6k
Language model research paper using generalized autoregressive pretraining
created 6 years ago
updated 2 years ago
Feedback? Help us improve.