LLaMA-Omni by ictnlp

Speech-language model for low-latency, high-quality speech interaction

Created 1 year ago

3,113 stars

Top 15.2% on SourcePulse

View on GitHub

3 Experts Love This Project

Project Summary

LLaMA-Omni provides a low-latency, end-to-end speech interaction model built on Llama-3.1-8B-Instruct, enabling simultaneous text and speech response generation from speech input. It targets researchers and developers seeking GPT-4o-level speech capabilities in an open-source framework.

How It Works

LLaMA-Omni integrates a speech encoder (borrowing from SLAM-LLM) and a speech-to-speech (s2s) adaptor with the Llama-3.1-8B-Instruct LLM. This architecture allows it to process spoken instructions, generate text, and synthesize speech concurrently, achieving low latency through efficient model design and training.

Quick Start & Requirements

Install: Clone repo, create conda env (python=3.10), pip install -e ., pip install -e . in fairseq repo, pip install flash-attn --no-build-isolation.
Prerequisites: Llama-3.1-8B-Omni model (Huggingface), Whisper-large-v3 model, unit-based HiFi-GAN vocoder.
Demo: Launch controller (python -m omni_speech.serve.controller), Gradio server (python -m omni_speech.serve.gradio_web_server), and model worker (python -m omni_speech.serve.model_worker).
Docs: https://github.com/ictnlp/LLaMA-Omni

Highlighted Details

Built on Llama-3.1-8B-Instruct for high-quality text responses.
Achieves speech interaction latency as low as 226ms.
Supports simultaneous text and speech response generation.
Trained in under 3 days on 4 GPUs.

Maintenance & Community

LLaMA-Omni2 released with models from 0.5B to 32B parameters.
Accepted at ICLR 2025.
Contact: fangqingkai21b@ict.ac.cn for questions.
Citation details provided.

Licensing & Compatibility

Code: Apache-2.0 License.
Model: Intended for academic research only; NOT for commercial purposes. Commercial use requires contacting fengyang@ict.ac.cn for a license.

Limitations & Caveats

The model is strictly for academic research and cannot be used commercially without a separate license. Streaming audio playback in the Gradio demo is implemented without autoplay due to stability concerns.

Health Check

Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

17 stars in the last 30 days