Speech-language model for low-latency, high-quality speech interaction
Top 16.5% on sourcepulse
LLaMA-Omni provides a low-latency, end-to-end speech interaction model built on Llama-3.1-8B-Instruct, enabling simultaneous text and speech response generation from speech input. It targets researchers and developers seeking GPT-4o-level speech capabilities in an open-source framework.
How It Works
LLaMA-Omni integrates a speech encoder (borrowing from SLAM-LLM) and a speech-to-speech (s2s) adaptor with the Llama-3.1-8B-Instruct LLM. This architecture allows it to process spoken instructions, generate text, and synthesize speech concurrently, achieving low latency through efficient model design and training.
Quick Start & Requirements
python=3.10
), pip install -e .
, pip install -e .
in fairseq repo, pip install flash-attn --no-build-isolation
.python -m omni_speech.serve.controller
), Gradio server (python -m omni_speech.serve.gradio_web_server
), and model worker (python -m omni_speech.serve.model_worker
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The model is strictly for academic research and cannot be used commercially without a separate license. Streaming audio playback in the Gradio demo is implemented without autoplay due to stability concerns.
2 months ago
1 day