ConvBert  by yitu-opensource

ConvBERT: research paper improving BERT via dynamic convolution

created 5 years ago
251 stars

Top 99.8% on sourcepulse

GitHubView on GitHub
Project Summary

ConvBERT introduces a novel architecture for pre-trained language models, enhancing BERT with span-based dynamic convolution. This approach aims to improve performance and efficiency for natural language processing tasks, targeting researchers and practitioners in the field.

How It Works

ConvBERT integrates dynamic convolutions within its architecture, allowing for adaptive kernel generation based on input features. This differs from standard self-attention mechanisms by employing convolutional operations that can dynamically adjust their receptive fields and weights, potentially leading to more efficient and effective feature extraction for language understanding.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies: pip install tensorflow==1.15 numpy scikit-learn.
  • Prerequisites: TensorFlow 1.15, NumPy, scikit-learn. Tested on a V100 GPU.
  • Pre-training: Requires downloading the OpenWebText corpus (12GB), processing it (approx. 30GB disk space), and running bash build_data.sh followed by bash pretrain.sh.
  • Fine-tuning: Download GLUE data and run bash finetune.sh. A Google Colab notebook is available for a quick example.
  • Pre-trained Model: Available for download.

Highlighted Details

  • Introduces span-based dynamic convolution for improved language model pre-training.
  • Achieves competitive performance on GLUE benchmarks.
  • Codebase is based on ELECTRA.

Maintenance & Community

No specific information on maintainers, community channels, or roadmap is provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code is presented for research purposes.

Limitations & Caveats

The project requires TensorFlow 1.15, which is an older version and may present compatibility challenges with modern TensorFlow ecosystems. Pre-training requires significant data and disk space.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Alex Cheema Alex Cheema(Cofounder of EXO Labs), and
1 more.

recurrent-pretraining by seal-rg

0.1%
806
Pretraining code for depth-recurrent language model research
created 5 months ago
updated 2 weeks ago
Feedback? Help us improve.