electra by google-research

Text encoder pre-training via GAN-like discriminator

Created 5 years ago

2,368 stars

Top 19.1% on SourcePulse

View on GitHub

8 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Benjamin Bolte

Cofounder of K-Scale Labs

Lysandre Debut

Chief Open-Source Officer at Hugging Face

Forrest Iandola

Author of SqueezeNet; Research Scientist at Meta

and 4 more!

Project Summary

ELECTRA offers a self-supervised method for pre-training transformer text encoders, designed for efficiency and state-of-the-art performance. It targets researchers and practitioners in NLP who need to pre-train or fine-tune language models for downstream tasks like classification, question answering, and sequence tagging. The core benefit is achieving strong results with significantly less compute compared to generator-based pre-training methods.

How It Works

ELECTRA trains models as discriminators that distinguish between "real" input tokens and "fake" tokens generated by a smaller, auxiliary network. This "replaced token detection" objective is more sample-efficient than traditional masked language modeling, allowing for faster pre-training and better performance with limited compute. The repository also includes code for "Electric," an energy-based variant for more principled negative sampling.

Quick Start & Requirements

Install: Requires Python 3, TensorFlow 1.15, NumPy, scikit-learn, and SciPy.
Pre-training:
- Prepare data using build_pretraining_dataset.py (requires BERT's vocabulary).
- Train using run_pretraining.py. A small model trained on OpenWebText takes ~4 days on a V100 GPU.
- Pre-training data (tfrecords) requires ~30GB disk space.
Fine-tuning:
- Download pre-trained models or train your own.
- Fine-tune using run_finetuning.py for tasks like GLUE, SQuAD, and sequence tagging.
Links: Official Paper, Electric Paper, BERT Vocabulary

Highlighted Details

ELECTRA-Large achieves 85.2 GLUE score, outperforming ALBERT/XLNET.
ELECTRA-Base (82.7 GLUE) outperforms BERT-Large.
ELECTRA-Small (77.4 GLUE) offers competitive performance without distillation.
Supports fine-tuning on GLUE, SQuAD (1.1 & 2.0), MRQA, and sequence tagging tasks.

Maintenance & Community

Developed by Google Research.
Contact: Kevin Clark (kevclark@cs.stanford.edu) for personal communication. Submit GitHub issues for support.

Licensing & Compatibility

Apache 2.0 License. Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

Requires TensorFlow 1.15; TensorFlow 2.0 support is planned but not guaranteed.
The original pre-training dataset used in the paper is not publicly available, requiring users to source their own data or use alternatives like OpenWebText.

Health Check

Last Commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days