Chat LLM based on state-space model architecture
Top 40.2% on sourcepulse
Mamba-Chat is a conversational large language model that leverages the Mamba state-space architecture, offering an alternative to Transformer-based models. It is designed for researchers and developers interested in exploring efficient, linear-time sequence modeling for LLMs.
How It Works
Mamba-Chat is built upon the Mamba-2.8B model, which utilizes a selective state-space architecture. This approach allows for linear time complexity in sequence modeling, a significant departure from the quadratic complexity of traditional Transformers. The model was fine-tuned on a subset of the HuggingFaceH4/ultrachat_200k dataset, enabling it to perform conversational tasks.
Quick Start & Requirements
pip install -r requirements.txt
python chat.py
pip install gradio==4.8.0
then python app.py --share
data/ultrachat_small.jsonl
) and a GPU with at least 24GB VRAM for recommended settings.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The repository does not specify the license for the Mamba-Chat model itself, which may impact commercial use or integration into closed-source projects. Fine-tuning instructions suggest specific hardware requirements (24GB GPU).
1 year ago
1 day