LLaMA-Omni2  by ictnlp

LLM-based real-time spoken chatbot

Created 8 months ago
253 stars

Top 99.4% on SourcePulse

GitHubView on GitHub
Project Summary

LLaMA-Omni 2 is a series of speech-language models based on Qwen2.5 Instruct, designed for real-time spoken chatbots. It enables simultaneous text and speech generation, offering high-quality, low-latency interactions through a novel streaming autoregressive speech decoder. This project targets researchers and developers seeking advanced conversational AI with integrated speech capabilities.

How It Works

The system integrates Qwen2.5 LLMs with a streaming autoregressive speech decoder. This architecture allows for simultaneous generation of both text and speech responses, minimizing latency. The core innovation lies in the decoder's ability to produce speech in an autoregressive, streaming fashion, enhancing both quality and responsiveness compared to previous methods.

Quick Start & Requirements

Installation involves cloning the repository, creating a Conda environment (Python 3.10), and installing the package (pip install -e .). Prerequisites include downloading the Whisper-large-v3 model, CosyVoice 2 flow-matching model and vocoder, and specific LLaMA-Omni2 model checkpoints (e.g., ICTNLP/LLaMA-Omni2-7B-Bilingual) from Hugging Face. A Gradio demo can be launched by starting a controller, a web server, and a model worker using provided Python commands. Local inference scripts are also available.

Highlighted Details

  • Accepted at ACL 2025 main conference.
  • Offers bilingual (English/Chinese) support for models like LLaMA-Omni2-7B-Bilingual.
  • Enables simultaneous text and speech generation for low-latency interaction.

Maintenance & Community

For questions, submit an issue or contact fangqingkai21b@ict.ac.cn. Commercial use inquiries should be directed to fengyang@ict.ac.cn.

Licensing & Compatibility

The code is released under the Apache-2.0 License. However, the models are strictly intended for academic research purposes and may not be used commercially. A commercial license requires explicit contact and agreement.

Limitations & Caveats

The primary limitation is the strict non-commercial use restriction for the models, requiring separate licensing for any commercial application.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.