mamba-chat  by redotvideo

Chat LLM based on state-space model architecture

Created 1 year ago
931 stars

Top 39.3% on SourcePulse

GitHubView on GitHub
Project Summary

Mamba-Chat is a conversational large language model that leverages the Mamba state-space architecture, offering an alternative to Transformer-based models. It is designed for researchers and developers interested in exploring efficient, linear-time sequence modeling for LLMs.

How It Works

Mamba-Chat is built upon the Mamba-2.8B model, which utilizes a selective state-space architecture. This approach allows for linear time complexity in sequence modeling, a significant departure from the quadratic complexity of traditional Transformers. The model was fine-tuned on a subset of the HuggingFaceH4/ultrachat_200k dataset, enabling it to perform conversational tasks.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Run CLI chatbot: python chat.py
  • Run Gradio app: pip install gradio==4.8.0 then python app.py --share
  • Fine-tuning requires a dataset (e.g., data/ultrachat_small.jsonl) and a GPU with at least 24GB VRAM for recommended settings.

Highlighted Details

  • First chat LLM based on Mamba state-space architecture.
  • Fine-tuned on 16,000 samples from HuggingFaceH4/ultrachat_200k.
  • Provides code for testing and fine-tuning.

Maintenance & Community

  • Project is hosted on GitHub.
  • Community interaction is available via the Haven Community Discord.
  • Model is available on Huggingface.

Licensing & Compatibility

  • The repository does not explicitly state a license. The underlying Mamba model implementation by Tri Dao is Apache 2.0 licensed.

Limitations & Caveats

The repository does not specify the license for the Mamba-Chat model itself, which may impact commercial use or integration into closed-source projects. Fine-tuning instructions suggest specific hardware requirements (24GB GPU).

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.