mamba2-minimal by tommyip

Minimal Mamba-2 implementation for efficient sequence modeling

Created 2 years ago

255 stars

Top 98.8% on SourcePulse

Project Summary

A minimal, single-file PyTorch implementation of the Mamba-2 State Space Model (SSM) architecture. It addresses the quadratic complexity of Transformers by offering linear scaling with sequence length during training and constant time per step during inference, making it suitable for researchers and practitioners seeking efficient foundation models.

How It Works

This project implements Mamba-2, a novel SSM variant that imposes specific constraints on SSM parameters. This design allows for significantly larger state dimensions and faster training compared to Mamba-1. The core SSM approach maps sequences through a hidden state, enabling efficient computation and memory usage, particularly beneficial for long sequences.

Quick Start & Requirements

Install dependencies using pip install -r requirements.txt.
Requires PyTorch, einops, and transformers.
Tested on CPU and MPS (Metal Performance Shaders) backends.
A demo.ipynb notebook demonstrates usage with pretrained weights for text generation.
See Mamba-2 paper for architectural details.

Highlighted Details

Single-file, minimal implementation for clarity and ease of integration.
Device-agnostic design supporting CPU and Apple Silicon (MPS).
Core Mamba-2 model implementation provided, with usage examples in demo.ipynb.
Output logits follow the same distribution as the reference implementation but are not numerically equivalent.

Maintenance & Community

The project is inspired by johnma2006/mamba-minimal and implements the Mamba-2 architecture by Gu and Dao. No specific community channels or active maintenance signals are detailed in the README.

Licensing & Compatibility

The README does not specify a license. This omission requires clarification for any potential use, especially commercial applications.

Limitations & Caveats

The implementation is marked with TODOs, including a potential future removal of the einops dependency if readability is maintained. The output logits are explicitly stated as not being numerically equivalent to the reference Mamba-2 implementation.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days