based  by HazyResearch

Efficient linear attention language models balancing recall and throughput

Created 2 years ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

The HazyResearch/based repository provides code and pre-trained models for "Based" language models, an architecture designed to balance the recall-throughput tradeoff. It targets researchers and engineers seeking efficient subquadratic models that can capture both local and long-range dependencies, aiming to bridge the performance gap with traditional Transformers. The primary benefit is achieving Transformer-like recall capabilities within a more computationally efficient framework.

How It Works

Based models combine two core ideas: short sliding window attention for fine-grained local dependencies and "dense" global linear attention for long-range context. This hybrid approach uses exact softmax attention locally and a softmax-approximating linear attention globally. This design enables a 100% subquadratic model architecture that effectively addresses the recall-throughput tradeoff, outperforming other sub-quadratic proposals.

Quick Start & Requirements

Installation requires cloning the repository, installing specific PyTorch versions (2.1.2 with CUDA 11.8 support), and then installing the package in editable mode (pip install -e .). Recommended environment: Python 3.8.18, PyTorch 2.1.2. A quick-start notebook (notebooks/03-24-quick-start.ipynb) and pre-trained models are available on HuggingFace.

Highlighted Details

  • Pre-trained checkpoints are available for 360M and 1.3B parameter scales, trained on 10B-50B tokens of The Pile corpus.
  • Models are evaluated on standard benchmarks and custom recall-intensive tasks: SWDE, FDA (information extraction), and SQUAD-Completion (document QA).
  • Integration with "ThunderKittens CUDA kernels" is provided for accelerated performance.
  • The repository includes code for training new Based models and evaluating existing checkpoints.

Maintenance & Community

Compute resources for training were provided by Together.ai and Google Cloud Platform. No specific community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The README includes a GitHub license badge pointing to the HazyResearch/meerkat repository, but the specific license for the based repository itself is not explicitly stated within this README. Models are released strictly "for research" and are "not intended for use in any downstream applications," indicating significant restrictions for commercial or production deployment.

Limitations & Caveats

The released models are not instruction fine-tuned or audited, and are explicitly not intended for downstream applications. Users may encounter dependency issues related to the causal-conv1d interface.

Health Check
Last Commit

10 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Mehdi Amini Mehdi Amini(Author of MLIR; Distinguished Engineer at NVIDIA), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
15 more.

flashinfer by flashinfer-ai

1.6%
5k
Kernel library for LLM serving
Created 2 years ago
Updated 1 day ago
Feedback? Help us improve.