Sparse-attention transformer extends BERT-like models to longer sequences
Top 54.2% on sourcepulse
BigBird is a sparse-attention based Transformer model designed to extend the capabilities of models like BERT to significantly longer sequences. It targets NLP researchers and practitioners working with tasks such as question answering and summarization, offering improved performance and reduced memory consumption compared to standard Transformers.
How It Works
BigBird employs a sparse attention mechanism, specifically a block-sparse attention pattern, which includes local, global, and random attention. This approach theoretically allows the model to handle the full context of a sequence, unlike other sparse attention methods that might miss information. The advantage lies in its ability to process much longer sequences efficiently, reducing memory overhead without compromising performance on tasks requiring extensive context.
Quick Start & Requirements
pip3 install -e .
after cloning the repository.imdb.ipynb
.Highlighted Details
original_full
, simulated_sparse
, and block_sparse
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
original_full
attention is advised as BigBird's sparse attention offers no benefit.2 years ago
Inactive