LLaMA-Omni2 by ictnlp

LLM-based real-time spoken chatbot

Created 10 months ago

263 stars

Top 96.9% on SourcePulse

Project Summary

LLaMA-Omni 2 is a series of speech-language models based on Qwen2.5 Instruct, designed for real-time spoken chatbots. It enables simultaneous text and speech generation, offering high-quality, low-latency interactions through a novel streaming autoregressive speech decoder. This project targets researchers and developers seeking advanced conversational AI with integrated speech capabilities.

How It Works

The system integrates Qwen2.5 LLMs with a streaming autoregressive speech decoder. This architecture allows for simultaneous generation of both text and speech responses, minimizing latency. The core innovation lies in the decoder's ability to produce speech in an autoregressive, streaming fashion, enhancing both quality and responsiveness compared to previous methods.

Quick Start & Requirements

Installation involves cloning the repository, creating a Conda environment (Python 3.10), and installing the package (pip install -e .). Prerequisites include downloading the Whisper-large-v3 model, CosyVoice 2 flow-matching model and vocoder, and specific LLaMA-Omni2 model checkpoints (e.g., ICTNLP/LLaMA-Omni2-7B-Bilingual) from Hugging Face. A Gradio demo can be launched by starting a controller, a web server, and a model worker using provided Python commands. Local inference scripts are also available.

Highlighted Details

Accepted at ACL 2025 main conference.
Offers bilingual (English/Chinese) support for models like LLaMA-Omni2-7B-Bilingual.
Enables simultaneous text and speech generation for low-latency interaction.

Maintenance & Community

For questions, submit an issue or contact fangqingkai21b@ict.ac.cn. Commercial use inquiries should be directed to fengyang@ict.ac.cn.

Licensing & Compatibility

The code is released under the Apache-2.0 License. However, the models are strictly intended for academic research purposes and may not be used commercially. A commercial license requires explicit contact and agreement.

LLaMA-Omni2 by ictnlp

Explore Similar Projects

LLMVoX by mbzuai-oryx

echogarden by echogarden-project

AIVoiceChat by KoljaB

FireRedTTS2 by FireRedTeam

ichigo by janhq

QuickAgent by gkamradt

LLaMA-Omni by ictnlp

mini-omni by gpt-omni

whisper_streaming by ufal

KittenTTS by KittenML

Zonos by Zyphra

piper by rhasspy