bert by google-research

TensorFlow code and pre-trained models for BERT

Created 7 years ago

39,798 stars

Top 0.8% on SourcePulse

View on GitHub

31 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Pawel Garbacki

Cofounder of Fireworks AI

Jesse Clark

Cofounder of Marqo

Evan Hubinger

Head of Alignment Stress-Testing at Anthropic

and 27 more!

Project Summary

This repository provides the TensorFlow code and pre-trained models for BERT (Bidirectional Encoder Representations from Transformers), a deep bidirectional Transformer encoder for language representation. It enables state-of-the-art performance on a wide array of Natural Language Processing tasks by pre-training on a large text corpus and then fine-tuning for specific downstream applications. The target audience includes NLP researchers and engineers looking to leverage powerful pre-trained language models.

How It Works

BERT is pre-trained using two unsupervised tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP). MLM involves masking 15% of input tokens and training the model to predict them, forcing it to learn deep bidirectional context. NSP trains the model to predict whether two sentences follow each other sequentially in the original corpus. This deep bidirectionality, unlike previous unidirectional models, allows BERT to capture richer contextual understanding.

Quick Start & Requirements

Install/Run: Primarily uses TensorFlow. Fine-tuning examples are provided via Python scripts (e.g., run_classifier.py, run_squad.py).
Prerequisites: TensorFlow (tested with 1.11.0), Python 2/3. GPU or Cloud TPU recommended for fine-tuning, especially for BERT-Large.
Resources: Fine-tuning BERT-Base on GLUE tasks can take minutes on a GPU. BERT-Large fine-tuning for SQuAD may require Cloud TPUs or careful memory management due to high RAM requirements (64GB for original experiments).
Links: TensorFlow Hub Module, [Colab Notebook](https://colab.research.google.com/notebooks/ تpuc_tutorial.ipynb)

Highlighted Details

Offers a variety of pre-trained models: BERT-Base/Large (cased/uncased), Whole Word Masking variants, Multilingual, and Chinese models.
Includes code for fine-tuning on tasks like SQuAD, MultiNLI, and MRPC, as well as for feature extraction.
Introduces smaller BERT models (BERT-Tiny, Mini, Small, Medium) for resource-constrained environments.
Achieves state-of-the-art results on numerous NLP benchmarks, including SQuAD 1.1 and 2.0.

Maintenance & Community

Developed by Google Research.
Primarily maintained through GitHub issues for support.
Third-party PyTorch and Chainer implementations are available from HuggingFace and Sosuke Kobayashi, respectively.

Licensing & Compatibility

Released under the Apache 2.0 license.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

BERT-Large fine-tuning can be memory-intensive, potentially exceeding the capacity of typical consumer GPUs (12-16GB RAM), necessitating gradient accumulation or gradient checkpointing (not yet implemented in the release).
Pre-training from scratch is computationally expensive and time-consuming.
The original C++ pre-training code is not included.

Health Check

Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

98 stars in the last 30 days