SpanBERT  by facebookresearch

SpanBERT is a research paper implementation for improved pre-training

created 6 years ago
899 stars

Top 41.2% on sourcepulse

GitHubView on GitHub
Project Summary

SpanBERT provides code and pre-trained models for improving BERT's performance by representing and predicting spans, targeting NLP researchers and practitioners. It offers significant performance gains on tasks like question answering and relation extraction compared to standard BERT.

How It Works

SpanBERT introduces novel pre-training objectives: Masked Language Span Prediction (MLSP) and Inter-Span Distance Prediction (IDP). MLSP masks contiguous spans of tokens and trains the model to predict the entire masked span, encouraging better contextual understanding. IDP helps the model learn relationships between spans, improving performance on tasks requiring reasoning about span relationships.

Quick Start & Requirements

  • Install via pip install apex (specific commit required: NVIDIA/apex@4a8c4ac).
  • Requires Python and PyTorch.
  • Fine-tuning scripts are provided for SQuAD, TACRED, MRQA, and GLUE tasks.
  • Pre-trained models (base and large, cased) are available and compatible with HuggingFace BERT formats.
  • Download fine-tuned models using ./code/download_finetuned.sh <model_dir> <task>.

Highlighted Details

  • Achieves state-of-the-art results on SQuAD 1.1 (94.6 F1), SQuAD 2.0 (88.7 F1), and Coreference Resolution.
  • Offers both 110M (base) and 340M (large) parameter models.
  • Fine-tuning code supports multiple NLP benchmarks including SQuAD, TACRED, MRQA, and GLUE.
  • Coreference resolution fine-tuning code is in a separate TensorFlow repository.

Maintenance & Community

  • Developed by Facebook AI Research (FAIR).
  • Contact points provided for questions. GitHub issues are encouraged.

Licensing & Compatibility

  • License: CC-BY-NC 4.0.
  • This non-commercial license applies to both the code and pre-trained models, restricting commercial use.

Limitations & Caveats

  • The Apex dependency requires a specific, older commit, which may pose installation challenges.
  • Coreference resolution implementation is in a separate TensorFlow repository, not directly integrated here.
  • The CC-BY-NC 4.0 license prohibits commercial use.
Health Check
Last commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.