Discover and explore top open-source AI tools and projects—updated daily.
ictnlpSpeech-language model for low-latency, high-quality speech interaction
Top 15.5% on SourcePulse
LLaMA-Omni provides a low-latency, end-to-end speech interaction model built on Llama-3.1-8B-Instruct, enabling simultaneous text and speech response generation from speech input. It targets researchers and developers seeking GPT-4o-level speech capabilities in an open-source framework.
How It Works
LLaMA-Omni integrates a speech encoder (borrowing from SLAM-LLM) and a speech-to-speech (s2s) adaptor with the Llama-3.1-8B-Instruct LLM. This architecture allows it to process spoken instructions, generate text, and synthesize speech concurrently, achieving low latency through efficient model design and training.
Quick Start & Requirements
python=3.10), pip install -e ., pip install -e . in fairseq repo, pip install flash-attn --no-build-isolation.python -m omni_speech.serve.controller), Gradio server (python -m omni_speech.serve.gradio_web_server), and model worker (python -m omni_speech.serve.model_worker).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The model is strictly for academic research and cannot be used commercially without a separate license. Streaming audio playback in the Gradio demo is implemented without autoplay due to stability concerns.
5 months ago
1 day
2noise