Discover and explore top open-source AI tools and projects—updated daily.
modal-labsReal-time voice chat app with speech-to-speech LLM
Top 32.8% on SourcePulse
A voice chat application demonstrating speech-to-speech language model integration, QuiLLMan targets developers building conversational AI applications. It offers near-instantaneous, human-like conversational responses through advanced audio streaming techniques, serving as a foundation for experimentation and custom LM-based apps.
How It Works
The system utilizes Kyutai Lab's Moshi model for continuous listening, planning, and responding. It employs the Mimi streaming encoder/decoder for unbroken audio input/output and a speech-text foundation model to manage response timing. Bidirectional websocket streaming combined with the Opus audio codec enables low-latency communication, achieving response times that closely mimic human speech cadence on stable internet connections.
Quick Start & Requirements
Development requires the modal Python package (pip install modal), a Modal account (modal setup), and an environment variable for a Modal token (modal token new). The Moshi websocket server can be started locally using modal serve -m src.moshi. Testing the websocket connection involves installing development dependencies (pip install -r requirements/requirements-dev.txt) and running python tests/moshi_client.py. The frontend and HTTP server are served via modal serve src.app. Deployment is handled by modal deploy src.app. Changes are automatically reloaded, though frontend updates may require browser cache clearing.
Highlighted Details
Maintenance & Community
Contributions are explicitly welcomed. No specific community channels, maintainer information, or roadmap details are provided in the README.
Licensing & Compatibility
The README strongly advises users to check the specific license before any commercial use, indicating potential restrictions. No license type (e.g., MIT, Apache) is explicitly stated.
Limitations & Caveats
The code is provided primarily for illustration and experimentation. Users must independently verify licensing terms for commercial applications due to the lack of explicit licensing information.
6 months ago
Inactive
veekaybee
merrymercy
Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab),
google
grahamjenson
ThilinaRajapakse
google-research
triton-inference-server
tensorflow
visenger