mamba-chat by redotvideo

Chat LLM based on state-space model architecture

Created 2 years ago

940 stars

Top 38.9% on SourcePulse

View on GitHub

4 Experts Love This Project

Maxime Labonne

Head of Post-Training at Liquid AI

Vincent Weisser

Cofounder of Prime Intellect

Marc Klingen

Cofounder of Langfuse

Akshat Bubna

Cofounder of Modal

Project Summary

Mamba-Chat is a conversational large language model that leverages the Mamba state-space architecture, offering an alternative to Transformer-based models. It is designed for researchers and developers interested in exploring efficient, linear-time sequence modeling for LLMs.

How It Works

Mamba-Chat is built upon the Mamba-2.8B model, which utilizes a selective state-space architecture. This approach allows for linear time complexity in sequence modeling, a significant departure from the quadratic complexity of traditional Transformers. The model was fine-tuned on a subset of the HuggingFaceH4/ultrachat_200k dataset, enabling it to perform conversational tasks.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Run CLI chatbot: python chat.py
Run Gradio app: pip install gradio==4.8.0 then python app.py --share
Fine-tuning requires a dataset (e.g., data/ultrachat_small.jsonl) and a GPU with at least 24GB VRAM for recommended settings.

Highlighted Details

First chat LLM based on Mamba state-space architecture.
Fine-tuned on 16,000 samples from HuggingFaceH4/ultrachat_200k.
Provides code for testing and fine-tuning.

Maintenance & Community

Project is hosted on GitHub.
Community interaction is available via the Haven Community Discord.
Model is available on Huggingface.

Licensing & Compatibility

The repository does not explicitly state a license. The underlying Mamba model implementation by Tri Dao is Apache 2.0 licensed.

Limitations & Caveats

The repository does not specify the license for the Mamba-Chat model itself, which may impact commercial use or integration into closed-source projects. Fine-tuning instructions suggest specific hardware requirements (24GB GPU).

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days