mamba2-minimal  by tommyip

Minimal Mamba-2 implementation for efficient sequence modeling

Created 1 year ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
Project Summary

A minimal, single-file PyTorch implementation of the Mamba-2 State Space Model (SSM) architecture. It addresses the quadratic complexity of Transformers by offering linear scaling with sequence length during training and constant time per step during inference, making it suitable for researchers and practitioners seeking efficient foundation models.

How It Works

This project implements Mamba-2, a novel SSM variant that imposes specific constraints on SSM parameters. This design allows for significantly larger state dimensions and faster training compared to Mamba-1. The core SSM approach maps sequences through a hidden state, enabling efficient computation and memory usage, particularly beneficial for long sequences.

Quick Start & Requirements

  • Install dependencies using pip install -r requirements.txt.
  • Requires PyTorch, einops, and transformers.
  • Tested on CPU and MPS (Metal Performance Shaders) backends.
  • A demo.ipynb notebook demonstrates usage with pretrained weights for text generation.
  • See Mamba-2 paper for architectural details.

Highlighted Details

  • Single-file, minimal implementation for clarity and ease of integration.
  • Device-agnostic design supporting CPU and Apple Silicon (MPS).
  • Core Mamba-2 model implementation provided, with usage examples in demo.ipynb.
  • Output logits follow the same distribution as the reference implementation but are not numerically equivalent.

Maintenance & Community

The project is inspired by johnma2006/mamba-minimal and implements the Mamba-2 architecture by Gu and Dao. No specific community channels or active maintenance signals are detailed in the README.

Licensing & Compatibility

The README does not specify a license. This omission requires clarification for any potential use, especially commercial applications.

Limitations & Caveats

The implementation is marked with TODOs, including a potential future removal of the einops dependency if readability is maintained. The output logits are explicitly stated as not being numerically equivalent to the reference Mamba-2 implementation.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Coauthor of SGLang) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

llm-analysis by cli99

0%
486
CLI tool for LLM latency/memory analysis during training/inference
Created 3 years ago
Updated 1 year ago
Starred by Wing Lian Wing Lian(Founder of Axolotl AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

recurrent-pretraining by seal-rg

0.1%
887
Pretraining code for depth-recurrent language model research
Created 1 year ago
Updated 4 months ago
Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.1%
1k
Transformer library for flexible model development
Created 4 years ago
Updated 1 year ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
5 more.

matmulfreellm by ridgerchu

0.1%
3k
MatMul-free language models
Created 2 years ago
Updated 5 months ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Alex Chen Alex Chen(Cofounder of Nexa AI), and
25 more.

mamba by state-spaces

0.2%
18k
Mamba SSM architecture for sequence modeling
Created 2 years ago
Updated 2 weeks ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jiaming Song Jiaming Song(Chief Scientist at Luma AI), and
23 more.

Megatron-LM by NVIDIA

0.3%
16k
Framework for training transformer models at scale
Created 7 years ago
Updated 11 hours ago
Feedback? Help us improve.