m2  by HazyResearch

Sub-quadratic architecture research paper

created 2 years ago
555 stars

Top 58.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the implementation for Monarch Mixer (M2), a novel architecture designed to overcome the quadratic complexity of standard Transformers in both sequence length and model dimension. It targets researchers and practitioners in NLP seeking efficient, high-quality language models, offering sub-quadratic scaling with Transformer-level performance.

How It Works

Monarch Mixer replaces the quadratic-cost Attention and MLP layers of Transformers with layers built from Monarch matrices. These structured matrices generalize FFTs, offering sub-quadratic complexity, hardware efficiency, and expressiveness. This approach allows for efficient mixing of information across both sequence and model dimensions, leading to models that scale more favorably with longer sequences and larger model sizes.

Quick Start & Requirements

  • Embeddings API: M2-BERT embedding models are available via the Together API. A Python snippet demonstrates querying the API for embeddings.
  • Local Models: Instructions for evaluating and running models locally are available in bert/EMBEDDINGS.md.
  • Dependencies: Requires Python and standard ML libraries. Specific requirements for local training/fine-tuning are detailed within the BERT folder.

Highlighted Details

  • M2-BERT-large (260M parameters) matches BERT-large GLUE performance with 24% fewer parameters.
  • M2-BERT-large (341M parameters) outperforms BERT-large.
  • Offers long-context M2-BERT models (2k, 8k, 32k sequence lengths) and retrieval-focused embedding versions.
  • Introduces LoCo, a new benchmark for long-context retrieval.

Maintenance & Community

The project is associated with HazyResearch and has seen recent updates (January 2024) including new model releases and benchmark introductions. Citations indicate contributions from multiple authors and affiliations with academic institutions.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README text. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The README focuses on model availability and performance claims, with limited detail on the core codebase's maturity for general training or fine-tuning beyond the provided M2-BERT variants. Specific hardware requirements for local training are not detailed.

Health Check
Last commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Didier Lopes Didier Lopes(Founder of OpenBB), and
11 more.

sentence-transformers by UKPLab

0.2%
17k
Framework for text embeddings, retrieval, and reranking
created 6 years ago
updated 3 days ago
Feedback? Help us improve.