Open-source speech-language assistant for multimodal conversation
Top 58.4% on sourcepulse
LLaSM is an open-source, commercially viable conversational model supporting bilingual (Chinese/English) speech-text multimodal dialogue. It aims to simplify user interaction with large language models by enabling direct voice input, bypassing the complexities and potential errors of traditional Automatic Speech Recognition (ASR) pipelines.
How It Works
LLaSM integrates a large language model (LLM) with a speech processing component, allowing for end-to-end voice-based conversations. This multimodal approach directly processes audio input, fusing it with textual information for a more natural and efficient user experience. The model leverages pre-trained LLMs like Chinese-Llama-2-7B or Baichuan-7B and incorporates a speech encoder, likely based on models like Whisper, to handle audio understanding.
Quick Start & Requirements
conda
and pip
.
git clone https://github.com/LinkSoul-AI/LLaSM
cd LLaSM
conda create -n llasm python=3.10 -y
conda activate llasm
pip install --upgrade pip
pip install -e .
LLASM_DEVICE="cuda:0"
), Python 3.10, Whisper large v2 model.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README mentions a TODO for int4
quantization and Docker deployment, suggesting these features may be under development or not yet fully documented.
1 year ago
1+ week