bort  by alexa

Companion code for research paper on BERT subarchitecture extraction

created 4 years ago
473 stars

Top 65.3% on sourcepulse

GitHubView on GitHub
Project Summary

Bort provides an optimal subarchitecture for BERT, significantly reducing its size and computational requirements. This is achieved using a fully polynomial-time approximation scheme (FPTAS) for neural architecture search, making it suitable for researchers and practitioners seeking efficient NLP models.

How It Works

Bort extracts an optimal subset of BERT's architectural parameters, resulting in a model that is 5.5% the size of BERT-large (16% of net size). This approach leverages an FPTAS to efficiently search for efficient subarchitectures, offering substantial speedups (7.9x on CPU vs. BERT-base) and reduced pre-training time (1.2% of RoBERTa-large).

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Tested with Python 3.6.5+.
  • Pre-training requires Horovod installed from source with MXNet and CUDA 10.1 support.
  • Download pre-trained model: aws s3 cp s3://alexa-saif-bort/bort.params model/
  • Download sample text for testing: wget https://github.com/dmlc/gluon-nlp/blob/v0.9.x/scripts/bert/sample_text.txt

Highlighted Details

  • Achieves an average performance improvement of 0.3% to 31% over BERT-large on NLU benchmarks.
  • Bort has 56M parameters, 4 layers, 8 attention heads, and a hidden size of 1024.
  • Offers significant speedups on CPU and reduced pre-training time.
  • Supports GLUE, SuperGLUE, and RACE datasets with specific data preparation steps.

Maintenance & Community

The project is associated with research papers by Adrian de Wynter and Daniel J. Perry. No specific community channels or active maintenance signals are mentioned in the README.

Licensing & Compatibility

  • Licensed under the Apache-2.0 License.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

Fine-tuning may yield odd results without an implementation of the Agora algorithm, which is referenced but not included. Out-of-memory errors can occur with large batch sizes or sequence lengths; reducing sequence length is recommended.

Health Check
Last commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.