bert  by google-research

TensorFlow code and pre-trained models for BERT

created 6 years ago
39,378 stars

Top 0.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the TensorFlow code and pre-trained models for BERT (Bidirectional Encoder Representations from Transformers), a deep bidirectional Transformer encoder for language representation. It enables state-of-the-art performance on a wide array of Natural Language Processing tasks by pre-training on a large text corpus and then fine-tuning for specific downstream applications. The target audience includes NLP researchers and engineers looking to leverage powerful pre-trained language models.

How It Works

BERT is pre-trained using two unsupervised tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP). MLM involves masking 15% of input tokens and training the model to predict them, forcing it to learn deep bidirectional context. NSP trains the model to predict whether two sentences follow each other sequentially in the original corpus. This deep bidirectionality, unlike previous unidirectional models, allows BERT to capture richer contextual understanding.

Quick Start & Requirements

  • Install/Run: Primarily uses TensorFlow. Fine-tuning examples are provided via Python scripts (e.g., run_classifier.py, run_squad.py).
  • Prerequisites: TensorFlow (tested with 1.11.0), Python 2/3. GPU or Cloud TPU recommended for fine-tuning, especially for BERT-Large.
  • Resources: Fine-tuning BERT-Base on GLUE tasks can take minutes on a GPU. BERT-Large fine-tuning for SQuAD may require Cloud TPUs or careful memory management due to high RAM requirements (64GB for original experiments).
  • Links: TensorFlow Hub Module, [Colab Notebook](https://colab.research.google.com/notebooks/ تpuc_tutorial.ipynb)

Highlighted Details

  • Offers a variety of pre-trained models: BERT-Base/Large (cased/uncased), Whole Word Masking variants, Multilingual, and Chinese models.
  • Includes code for fine-tuning on tasks like SQuAD, MultiNLI, and MRPC, as well as for feature extraction.
  • Introduces smaller BERT models (BERT-Tiny, Mini, Small, Medium) for resource-constrained environments.
  • Achieves state-of-the-art results on numerous NLP benchmarks, including SQuAD 1.1 and 2.0.

Maintenance & Community

  • Developed by Google Research.
  • Primarily maintained through GitHub issues for support.
  • Third-party PyTorch and Chainer implementations are available from HuggingFace and Sosuke Kobayashi, respectively.

Licensing & Compatibility

  • Released under the Apache 2.0 license.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

  • BERT-Large fine-tuning can be memory-intensive, potentially exceeding the capacity of typical consumer GPUs (12-16GB RAM), necessitating gradient accumulation or gradient checkpointing (not yet implemented in the release).
  • Pre-training from scratch is computationally expensive and time-consuming.
  • The original C++ pre-training code is not included.
Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
404 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), Abhishek Thakur Abhishek Thakur(World's First 4x Kaggle GrandMaster), and
5 more.

xlnet by zihangdai

0.0%
6k
Language model research paper using generalized autoregressive pretraining
created 6 years ago
updated 2 years ago
Feedback? Help us improve.