mamba by state-spaces

Mamba SSM architecture for sequence modeling

Created 2 years ago

16,916 stars

Top 2.8% on SourcePulse

View on GitHub

27 Experts Love This Project

George Hotz

Author of tinygrad; Founder of the tiny corp, comma.ai

Alex Chen

Cofounder of Nexa AI

Luis Capelo

Cofounder of Lightning AI

Zack Li

Cofounder of Nexa AI

and 23 more!

Project Summary

Mamba is a novel state space model (SSM) architecture designed for efficient sequence modeling, particularly on information-dense data where traditional Transformers can be computationally prohibitive. It targets researchers and engineers building large language models or other sequence-aware AI systems, offering a linear-time complexity alternative to quadratic-complexity Transformers.

How It Works

Mamba leverages a selective state space model (SSM) approach, which allows it to dynamically adjust its behavior based on the input data. This is achieved through a hardware-aware implementation inspired by FlashAttention, optimizing the computation of the SSM recurrence relation for modern hardware. The core innovation lies in the selective mechanism, enabling Mamba to focus on relevant information and ignore irrelevant context, leading to improved performance on complex sequences.

Quick Start & Requirements

Install via pip: pip install mamba-ssm or pip install mamba-ssm[causal-conv1d].
Requirements: Linux, NVIDIA GPU, PyTorch 1.12+, CUDA 11.6+.
ROCm support for AMD GPUs is available with specific patching for ROCm 6.0.
Official documentation and examples are available within the repository.

Highlighted Details

Implements both Mamba and Mamba-2 architectures.
Provides pretrained models on Hugging Face up to 2.8B parameters.
Includes scripts for zero-shot evaluation using lm-evaluation-harness.
Offers generation benchmarking for latency and throughput.

Maintenance & Community

Developed by Albert Gu and Tri Dao, authors of the Mamba papers.
Pretrained models are hosted on Hugging Face.
Citation details for the Mamba and Mamba-2 papers are provided.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing terms.

Limitations & Caveats

Models are trained with PyTorch AMP; users may need to ensure compatible precision settings for stability.
Initialization details might require framework-specific adjustments to avoid unintended parameter resets.

Health Check

Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

255 stars in the last 30 days