quillman  by modal-labs

Real-time voice chat app with speech-to-speech LLM

Created 2 years ago
1,165 stars

Top 33.2% on SourcePulse

GitHubView on GitHub
Project Summary

A voice chat application demonstrating speech-to-speech language model integration, QuiLLMan targets developers building conversational AI applications. It offers near-instantaneous, human-like conversational responses through advanced audio streaming techniques, serving as a foundation for experimentation and custom LM-based apps.

How It Works

The system utilizes Kyutai Lab's Moshi model for continuous listening, planning, and responding. It employs the Mimi streaming encoder/decoder for unbroken audio input/output and a speech-text foundation model to manage response timing. Bidirectional websocket streaming combined with the Opus audio codec enables low-latency communication, achieving response times that closely mimic human speech cadence on stable internet connections.

Quick Start & Requirements

Development requires the modal Python package (pip install modal), a Modal account (modal setup), and an environment variable for a Modal token (modal token new). The Moshi websocket server can be started locally using modal serve -m src.moshi. Testing the websocket connection involves installing development dependencies (pip install -r requirements/requirements-dev.txt) and running python tests/moshi_client.py. The frontend and HTTP server are served via modal serve src.app. Deployment is handled by modal deploy src.app. Changes are automatically reloaded, though frontend updates may require browser cache clearing.

Highlighted Details

  • Powered by Kyutai Lab's Moshi speech-to-speech model.
  • Features Mimi streaming encoder/decoder for continuous audio.
  • Leverages bidirectional websockets and Opus codec for low-latency audio.
  • Intended as a starting point for language model-based applications.

Maintenance & Community

Contributions are explicitly welcomed. No specific community channels, maintainer information, or roadmap details are provided in the README.

Licensing & Compatibility

The README strongly advises users to check the specific license before any commercial use, indicating potential restrictions. No license type (e.g., MIT, Apache) is explicitly stated.

Limitations & Caveats

The code is provided primarily for illustration and experimentation. Users must independently verify licensing terms for commercial applications due to the lack of explicit licensing information.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Shengjia Zhao Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
14 more.

BIG-bench by google

0.1%
3k
Collaborative benchmark for probing and extrapolating LLM capabilities
Created 4 years ago
Updated 1 year ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
16 more.

text-to-text-transfer-transformer by google-research

0.1%
6k
Unified text-to-text transformer for NLP research
Created 6 years ago
Updated 5 months ago
Feedback? Help us improve.