ModernBERT  by AnswerDotAI

Research repo for modernizing BERT via architecture and scaling

created 1 year ago
1,468 stars

Top 28.5% on sourcepulse

GitHubView on GitHub
Project Summary

ModernBERT offers a modular and scalable approach to building Transformer encoder models, focusing on architectural improvements and efficient training. It's designed for researchers and practitioners aiming to develop state-of-the-art language models with enhanced performance and longer context capabilities.

How It Works

ModernBERT introduces FlexBERT, a flexible building block system for encoder architectures, configurable via YAML files. It builds upon MosaicBERT, integrating Flash Attention 2 for improved speed and memory efficiency. This modularity allows for easier experimentation with different architectural components and scaling strategies.

Quick Start & Requirements

  • Install via Conda: conda env create -f environment.yaml
  • Activate environment: conda activate bert24
  • Flash Attention: Requires building from source or installing precompiled wheels for Hopper GPUs, or pip install "flash_attn==2.6.3".
  • GPU-equipped machine is mandatory.
  • Setup time and resource requirements depend on model size and dataset.
  • For details: ModernBERT Collection on HuggingFace, arXiv preprint

Highlighted Details

  • Modular encoder design (FlexBERT) configurable via YAML.
  • Leverages Composer framework for training.
  • Supports both text and tokenized data formats (MDS, CSV/TSV, JSONL).
  • Includes scripts for fine-tuning and evaluating retrieval models (ColBERT, Sentence Transformers).

Maintenance & Community

ModernBERT is a collaboration between Answer.AI, LightOn, and friends. The repository is research-focused, with a HuggingFace collection available for easier integration. Further documentation and reproducibility are planned.

Licensing & Compatibility

The codebase builds upon MosaicBERT, which is under the Apache 2.0 license. This license permits commercial use and modification.

Limitations & Caveats

The README is noted as "very barebones and is still under construction." The StreamingTextDataset may exhibit uneven memory distribution across accelerators. Flash Attention installation can be complex, especially for specific GPU architectures.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
130 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

InternEvo by InternLM

1.0%
402
Lightweight training framework for model pre-training
created 1 year ago
updated 1 week ago
Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
created 3 years ago
updated 7 months ago
Feedback? Help us improve.