LLaSM by LinkSoul-AI

Open-source speech-language assistant for multimodal conversation

Created 2 years ago

559 stars

Top 57.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

LLaSM is an open-source, commercially viable conversational model supporting bilingual (Chinese/English) speech-text multimodal dialogue. It aims to simplify user interaction with large language models by enabling direct voice input, bypassing the complexities and potential errors of traditional Automatic Speech Recognition (ASR) pipelines.

How It Works

LLaSM integrates a large language model (LLM) with a speech processing component, allowing for end-to-end voice-based conversations. This multimodal approach directly processes audio input, fusing it with textual information for a more natural and efficient user experience. The model leverages pre-trained LLMs like Chinese-Llama-2-7B or Baichuan-7B and incorporates a speech encoder, likely based on models like Whisper, to handle audio understanding.

Quick Start & Requirements

Install: Clone the repository and install dependencies using conda and pip.

git clone https://github.com/LinkSoul-AI/LLaSM
cd LLaSM
conda create -n llasm python=3.10 -y
conda activate llasm
pip install --upgrade pip
pip install -e .

Prerequisites: CUDA-enabled GPU (for LLASM_DEVICE="cuda:0"), Python 3.10, Whisper large v2 model.
Demo: Available at Hugging Face Spaces.
Paper: arXiv:2308.15930

Highlighted Details

First open-source, commercially usable model for bilingual Chinese/English speech-text multimodal dialogue.
Offers direct voice input, avoiding separate ASR steps.
Supports Chinese-Llama-2-7B and Baichuan-7B LLMs.
Includes a Chinese/English speech SFT dataset (LLaSM-Audio-Instructions).

Maintenance & Community

Active development indicated by recent arXiv publication (2023).
Community interaction via WeChat group mentioned.

Licensing & Compatibility

License: Apache-2.0 license.
Compatibility: Permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The README mentions a TODO for int4 quantization and Docker deployment, suggesting these features may be under development or not yet fully documented.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days