Text encoder pre-training via GAN-like discriminator
Top 19.9% on sourcepulse
ELECTRA offers a self-supervised method for pre-training transformer text encoders, designed for efficiency and state-of-the-art performance. It targets researchers and practitioners in NLP who need to pre-train or fine-tune language models for downstream tasks like classification, question answering, and sequence tagging. The core benefit is achieving strong results with significantly less compute compared to generator-based pre-training methods.
How It Works
ELECTRA trains models as discriminators that distinguish between "real" input tokens and "fake" tokens generated by a smaller, auxiliary network. This "replaced token detection" objective is more sample-efficient than traditional masked language modeling, allowing for faster pre-training and better performance with limited compute. The repository also includes code for "Electric," an energy-based variant for more principled negative sampling.
Quick Start & Requirements
build_pretraining_dataset.py
(requires BERT's vocabulary).run_pretraining.py
. A small model trained on OpenWebText takes ~4 days on a V100 GPU.run_finetuning.py
for tasks like GLUE, SQuAD, and sequence tagging.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
Inactive