Awesome-state-space-models  by radarFudan

Collection of papers on state-space models

Created 2 years ago
604 stars

Top 54.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository is a curated collection of research papers and code related to State-Space Models (SSMs), serving as a comprehensive resource for researchers and practitioners exploring alternatives to Transformers for sequence modeling. It highlights advancements in SSM architectures, theoretical analyses, and applications across various domains like language, vision, and reinforcement learning.

How It Works

The collection showcases how SSMs, particularly variants like Mamba, aim to overcome the quadratic complexity of Transformers by leveraging efficient recurrence relations and selective state updates. Key innovations include input-dependent gating mechanisms, hardware-aware parallelization, and novel parameterization techniques to improve stability and performance on long sequences. This approach offers a compelling trade-off between computational efficiency and modeling power.

Quick Start & Requirements

This is a collection of papers and links, not a runnable library. To explore specific implementations, users should refer to the linked GitHub repositories for each paper. Requirements will vary per project but generally include Python and deep learning frameworks like PyTorch or JAX.

Highlighted Details

  • Mamba Architecture: Features papers detailing Mamba, a selective SSM that achieves linear time complexity with input-dependent gating.
  • Broad Applications: Covers SSM applications in language modeling, computer vision (VMamba, U-Mamba), reinforcement learning, and time series forecasting.
  • Theoretical Foundations: Includes research on generalization error analysis, stability, parameterization, and the theoretical expressivity of SSMs.
  • Transformer Comparisons: Presents studies that directly compare SSM performance against Transformers, often highlighting competitive or superior results on long-sequence tasks.

Maintenance & Community

The repository is maintained by radarFudan and appears to be actively updated with recent publications in the field. Links to GitHub repositories for many papers are provided, facilitating community engagement with specific implementations.

Licensing & Compatibility

The repository itself is a collection of links and does not have a specific license. Individual linked repositories will have their own licenses, which should be checked for compatibility with commercial or closed-source use.

Limitations & Caveats

This resource is a bibliography and code index, not a unified framework. Users must navigate individual paper repositories for setup, dependencies, and specific usage instructions. The rapid evolution of the field means some linked papers or code might become outdated.

Health Check
Last Commit

20 hours ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Wing Lian Wing Lian(Founder of Axolotl AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

recurrent-pretraining by seal-rg

0.2%
840
Pretraining code for depth-recurrent language model research
Created 9 months ago
Updated 2 weeks ago
Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI) and Logan Kilpatrick Logan Kilpatrick(Product Lead on Google AI Studio).

model-zoo by FluxML

0%
933
Julia/FluxML model demos
Created 8 years ago
Updated 11 months ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

Hands-On-Large-Language-Models by HandsOnLLM

2.3%
17k
Code examples for "Hands-On Large Language Models" book
Created 1 year ago
Updated 3 months ago
Feedback? Help us improve.