m2 by HazyResearch

Sub-quadratic architecture research paper

Created 2 years ago

562 stars

Top 57.1% on SourcePulse

View on GitHub

6 Experts Love This Project

Yaowei Zheng

Author of LLaMA-Factory

Pawel Garbacki

Cofounder of Fireworks AI

Jesse Clark

Cofounder of Marqo

Omar Sanseviero

DevRel at Google DeepMind

and 2 more!

Project Summary

This repository provides the implementation for Monarch Mixer (M2), a novel architecture designed to overcome the quadratic complexity of standard Transformers in both sequence length and model dimension. It targets researchers and practitioners in NLP seeking efficient, high-quality language models, offering sub-quadratic scaling with Transformer-level performance.

How It Works

Monarch Mixer replaces the quadratic-cost Attention and MLP layers of Transformers with layers built from Monarch matrices. These structured matrices generalize FFTs, offering sub-quadratic complexity, hardware efficiency, and expressiveness. This approach allows for efficient mixing of information across both sequence and model dimensions, leading to models that scale more favorably with longer sequences and larger model sizes.

Quick Start & Requirements

Embeddings API: M2-BERT embedding models are available via the Together API. A Python snippet demonstrates querying the API for embeddings.
Local Models: Instructions for evaluating and running models locally are available in bert/EMBEDDINGS.md.
Dependencies: Requires Python and standard ML libraries. Specific requirements for local training/fine-tuning are detailed within the BERT folder.

Highlighted Details

M2-BERT-large (260M parameters) matches BERT-large GLUE performance with 24% fewer parameters.
M2-BERT-large (341M parameters) outperforms BERT-large.
Offers long-context M2-BERT models (2k, 8k, 32k sequence lengths) and retrieval-focused embedding versions.
Introduces LoCo, a new benchmark for long-context retrieval.

Maintenance & Community

The project is associated with HazyResearch and has seen recent updates (January 2024) including new model releases and benchmark introductions. Citations indicate contributions from multiple authors and affiliations with academic institutions.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README text. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The README focuses on model availability and performance claims, with limited detail on the core codebase's maturity for general training or fine-tuning beyond the provided M2-BERT variants. Specific hardware requirements for local training are not detailed.

Health Check

Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days