quillman by modal-labs

Real-time voice chat app with speech-to-speech LLM

Created 2 years ago

1,179 stars

Top 32.8% on SourcePulse

View on GitHub

8 Experts Love This Project

Laurent Mazare

Cofounder of Kyutai

Benjamin Bolte

Cofounder of K-Scale Labs

Omar Sanseviero

DevRel at Google DeepMind

Erik Bernhardsson

Founder of Modal

and 4 more!

Project Summary

A voice chat application demonstrating speech-to-speech language model integration, QuiLLMan targets developers building conversational AI applications. It offers near-instantaneous, human-like conversational responses through advanced audio streaming techniques, serving as a foundation for experimentation and custom LM-based apps.

How It Works

The system utilizes Kyutai Lab's Moshi model for continuous listening, planning, and responding. It employs the Mimi streaming encoder/decoder for unbroken audio input/output and a speech-text foundation model to manage response timing. Bidirectional websocket streaming combined with the Opus audio codec enables low-latency communication, achieving response times that closely mimic human speech cadence on stable internet connections.

Quick Start & Requirements

Development requires the modal Python package (pip install modal), a Modal account (modal setup), and an environment variable for a Modal token (modal token new). The Moshi websocket server can be started locally using modal serve -m src.moshi. Testing the websocket connection involves installing development dependencies (pip install -r requirements/requirements-dev.txt) and running python tests/moshi_client.py. The frontend and HTTP server are served via modal serve src.app. Deployment is handled by modal deploy src.app. Changes are automatically reloaded, though frontend updates may require browser cache clearing.

Highlighted Details

Powered by Kyutai Lab's Moshi speech-to-speech model.
Features Mimi streaming encoder/decoder for continuous audio.
Leverages bidirectional websockets and Opus codec for low-latency audio.
Intended as a starting point for language model-based applications.

Maintenance & Community

Contributions are explicitly welcomed. No specific community channels, maintainer information, or roadmap details are provided in the README.

Licensing & Compatibility

The README strongly advises users to check the specific license before any commercial use, indicating potential restrictions. No license type (e.g., MIT, Apache) is explicitly stated.

Limitations & Caveats

The code is provided primarily for illustration and experimentation. Users must independently verify licensing terms for commercial applications due to the lack of explicit licensing information.

Health Check

Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days