mamba-chat  by redotvideo

Chat LLM based on state-space model architecture

created 1 year ago
927 stars

Top 40.2% on sourcepulse

GitHubView on GitHub
Project Summary

Mamba-Chat is a conversational large language model that leverages the Mamba state-space architecture, offering an alternative to Transformer-based models. It is designed for researchers and developers interested in exploring efficient, linear-time sequence modeling for LLMs.

How It Works

Mamba-Chat is built upon the Mamba-2.8B model, which utilizes a selective state-space architecture. This approach allows for linear time complexity in sequence modeling, a significant departure from the quadratic complexity of traditional Transformers. The model was fine-tuned on a subset of the HuggingFaceH4/ultrachat_200k dataset, enabling it to perform conversational tasks.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Run CLI chatbot: python chat.py
  • Run Gradio app: pip install gradio==4.8.0 then python app.py --share
  • Fine-tuning requires a dataset (e.g., data/ultrachat_small.jsonl) and a GPU with at least 24GB VRAM for recommended settings.

Highlighted Details

  • First chat LLM based on Mamba state-space architecture.
  • Fine-tuned on 16,000 samples from HuggingFaceH4/ultrachat_200k.
  • Provides code for testing and fine-tuning.

Maintenance & Community

  • Project is hosted on GitHub.
  • Community interaction is available via the Haven Community Discord.
  • Model is available on Huggingface.

Licensing & Compatibility

  • The repository does not explicitly state a license. The underlying Mamba model implementation by Tri Dao is Apache 2.0 licensed.

Limitations & Caveats

The repository does not specify the license for the Mamba-Chat model itself, which may impact commercial use or integration into closed-source projects. Fine-tuning instructions suggest specific hardware requirements (24GB GPU).

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Calvin French-Owen Calvin French-Owen(Coounder of Segment), and
12 more.

StableLM by Stability-AI

0.0%
16k
Language models by Stability AI
created 2 years ago
updated 1 year ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Shawn Wang Shawn Wang(Editor of Latent Space), and
9 more.

mamba by state-spaces

0.4%
16k
Mamba SSM architecture for sequence modeling
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.