Sub-quadratic architecture research paper
Top 58.6% on sourcepulse
This repository provides the implementation for Monarch Mixer (M2), a novel architecture designed to overcome the quadratic complexity of standard Transformers in both sequence length and model dimension. It targets researchers and practitioners in NLP seeking efficient, high-quality language models, offering sub-quadratic scaling with Transformer-level performance.
How It Works
Monarch Mixer replaces the quadratic-cost Attention and MLP layers of Transformers with layers built from Monarch matrices. These structured matrices generalize FFTs, offering sub-quadratic complexity, hardware efficiency, and expressiveness. This approach allows for efficient mixing of information across both sequence and model dimensions, leading to models that scale more favorably with longer sequences and larger model sizes.
Quick Start & Requirements
bert/EMBEDDINGS.md
.BERT
folder.Highlighted Details
Maintenance & Community
The project is associated with HazyResearch and has seen recent updates (January 2024) including new model releases and benchmark introductions. Citations indicate contributions from multiple authors and affiliations with academic institutions.
Licensing & Compatibility
The repository's license is not explicitly stated in the provided README text. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.
Limitations & Caveats
The README focuses on model availability and performance claims, with limited detail on the core codebase's maturity for general training or fine-tuning beyond the provided M2-BERT variants. Specific hardware requirements for local training are not detailed.
7 months ago
Inactive