Discover and explore top open-source AI tools and projects—updated daily.
HazyResearchEfficient linear attention language models balancing recall and throughput
Top 99.8% on SourcePulse
Summary
The HazyResearch/based repository provides code and pre-trained models for "Based" language models, an architecture designed to balance the recall-throughput tradeoff. It targets researchers and engineers seeking efficient subquadratic models that can capture both local and long-range dependencies, aiming to bridge the performance gap with traditional Transformers. The primary benefit is achieving Transformer-like recall capabilities within a more computationally efficient framework.
How It Works
Based models combine two core ideas: short sliding window attention for fine-grained local dependencies and "dense" global linear attention for long-range context. This hybrid approach uses exact softmax attention locally and a softmax-approximating linear attention globally. This design enables a 100% subquadratic model architecture that effectively addresses the recall-throughput tradeoff, outperforming other sub-quadratic proposals.
Quick Start & Requirements
Installation requires cloning the repository, installing specific PyTorch versions (2.1.2 with CUDA 11.8 support), and then installing the package in editable mode (pip install -e .). Recommended environment: Python 3.8.18, PyTorch 2.1.2. A quick-start notebook (notebooks/03-24-quick-start.ipynb) and pre-trained models are available on HuggingFace.
Highlighted Details
Maintenance & Community
Compute resources for training were provided by Together.ai and Google Cloud Platform. No specific community channels (e.g., Discord, Slack) or roadmap links are provided in the README.
Licensing & Compatibility
The README includes a GitHub license badge pointing to the HazyResearch/meerkat repository, but the specific license for the based repository itself is not explicitly stated within this README. Models are released strictly "for research" and are "not intended for use in any downstream applications," indicating significant restrictions for commercial or production deployment.
Limitations & Caveats
The released models are not instruction fine-tuned or audited, and are explicitly not intended for downstream applications. Users may encounter dependency issues related to the causal-conv1d interface.
10 months ago
1 day
microsoft
google-research
MiniMax-AI
flashinfer-ai