unmute by kyutai-labs

LLM voice and speech interface

Created 7 months ago

1,067 stars

Top 35.5% on SourcePulse

1 Expert Loves This Project

LaurentMazare

Cofounder of Kyutai

Project Summary

Unmute enables text-based Large Language Models (LLMs) to interact audibly, facilitating real-time voice conversations. It's designed for users and developers seeking to integrate speech capabilities into LLM applications, offering a low-latency, flexible system.

How It Works

Unmute employs a pipeline where user speech is transcribed by a Speech-to-Text (STT) model, the resulting text is processed by an LLM, and the LLM's text response is converted to speech by a Text-to-Speech (TTS) model. This architecture prioritizes low latency by optimizing STT and TTS components and allowing integration with various LLM backends like VLLM or external APIs.

Quick Start & Requirements

Installation: Recommended via Docker Compose (docker compose up --build).
Hardware: GPU with CUDA support and at least 16 GB memory.
OS: Linux or Windows with WSL. macOS is not supported.
Dependencies: NVIDIA Container Toolkit for Docker. Hugging Face Hub token for LLM access.
Setup: Docker Compose setup is described as "Very easy."
Documentation: Unmute.sh

Highlighted Details

Achieves ~450ms TTS latency on a multi-GPU setup, down from ~750ms on a single GPU.
Supports running STT, TTS, and LLM on separate GPUs for performance gains.
Frontend is a Next.js app; backend communicates via a protocol based on OpenAI Realtime API.
Includes a load testing client for measuring latency and throughput.

Maintenance & Community

Project actively encourages issue reporting for troubleshooting.
Development pointers are provided for modifying voices, prompts, and swapping frontends.
Contributions for features like tool calling are welcomed.

Licensing & Compatibility

No explicit license is mentioned in the README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Native macOS support is not provided.
HTTPS support is omitted from default Docker Compose and Dockerless setups.
Docker Swarm deployment is documented for internal use but not supported for debugging.

Health Check

Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)

1

Issues (30d)

6

Star History

46 stars in the last 30 days

Explore Similar Projects

local_llm_assistant by nickbild

Local voice assistant for verbal requests, running on Raspberry Pi

Created 1 year ago

Updated 1 year ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

S.A.T.U.R.D.A.Y by GRVYDEV

Vocal computing toolbox for building voice interfaces to LLMs

Created 2 years ago

Updated 2 years ago

sage by farshed

Self-hosted voice chat with LLMs

Created 11 months ago

Updated 10 months ago

LLMVoX by mbzuai-oryx

Autoregressive TTS model for streaming speech from any LLM

Created 10 months ago

Updated 8 months ago

Auralis by astramind-ai

TTS engine for fast voice cloning

Created 1 year ago

Updated 11 months ago

LiveWhisper by Nikorasu

Live transcription tool using OpenAI's Whisper

Created 3 years ago

Updated 5 months ago

local-voice-ai by ShayneP

Local AI voice assistant with real-time speech and text capabilities

Created 8 months ago

Updated 1 week ago

tts by zuoban

TTS service for voice synthesis using Microsoft Azure

Created 1 year ago

Updated 2 weeks ago

Starred by

Dan Guido

Dan Guido(Cofounder of Trail of Bits),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

2 more.

ichigo by janhq

Speech package for local, real-time voice AI development

Created 1 year ago

Updated 1 month ago

bolna by bolna-ai

Voice AI agents platform for building conversational apps

Created 1 year ago

Updated 1 day ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

RealtimeVoiceChat by KoljaB

Real-time voice chat with AI using streaming audio

Created 8 months ago

Updated 6 months ago

stt by jianchang512

Offline speech-to-text tool for local audio/video transcription

Created 2 years ago

Updated 1 month ago

Feedback? Help us improve.