Research repo for modernizing BERT via architecture and scaling
Top 28.5% on sourcepulse
ModernBERT offers a modular and scalable approach to building Transformer encoder models, focusing on architectural improvements and efficient training. It's designed for researchers and practitioners aiming to develop state-of-the-art language models with enhanced performance and longer context capabilities.
How It Works
ModernBERT introduces FlexBERT, a flexible building block system for encoder architectures, configurable via YAML files. It builds upon MosaicBERT, integrating Flash Attention 2 for improved speed and memory efficiency. This modularity allows for easier experimentation with different architectural components and scaling strategies.
Quick Start & Requirements
conda env create -f environment.yaml
conda activate bert24
pip install "flash_attn==2.6.3"
.Highlighted Details
Maintenance & Community
ModernBERT is a collaboration between Answer.AI, LightOn, and friends. The repository is research-focused, with a HuggingFace collection available for easier integration. Further documentation and reproducibility are planned.
Licensing & Compatibility
The codebase builds upon MosaicBERT, which is under the Apache 2.0 license. This license permits commercial use and modification.
Limitations & Caveats
The README is noted as "very barebones and is still under construction." The StreamingTextDataset
may exhibit uneven memory distribution across accelerators. Flash Attention installation can be complex, especially for specific GPU architectures.
1 month ago
1 day