WhisperFusion by collabora

AI pipeline for real-time conversations

Created 2 years ago

1,643 stars

Top 25.5% on SourcePulse

3 Experts Love This Project

codekansas

Cofounder of K-Scale Labs

luiscape

Cofounder of Lightning AI

simonw

Coauthor of Django

Project Summary

WhisperFusion enables seamless, ultra-low latency conversations with AI by integrating a Large Language Model (LLM) with a real-time speech-to-text pipeline. It targets users seeking highly responsive AI interaction, leveraging optimized TensorRT engines for both Whisper and the LLM.

How It Works

WhisperFusion utilizes OpenAI's WhisperLive for real-time speech-to-text and integrates Mistral, an LLM, for enhanced understanding. Both Whisper and the LLM are optimized as TensorRT engines for high-performance, low-latency processing. WhisperSpeech further benefits from torch.compile for faster inference via JIT-compiled PyTorch kernels. This combination aims for maximum efficiency in conversational AI applications.

Quick Start & Requirements

Install/Run: docker compose build followed by docker compose up.
Prerequisites: GPU with at least 24GB RAM (RTX 4090 equivalent FP16 TFLOPS recommended for optimal latency). Requires Nvidia TensorRT-LLM.
Setup: Docker Compose setup includes pre-built TensorRT engines for Whisper and Phi, and a pre-downloaded WhisperSpeech model. A web GUI is available at http://localhost:8000.

Highlighted Details

Real-time speech-to-text via WhisperLive.
Integration of Mistral LLM for enhanced context.
TensorRT optimization for both Whisper and LLM for high performance.
torch.compile used for WhisperSpeech inference speedup.
Supports multiple GPUs via TensorRT-LLM for potential performance gains.

Maintenance & Community

Contact points are provided via email: marcus.edel@collabora.com, jpc@collabora.com, vineet.suryan@collabora.com. Issues can be opened directly on the repository.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Requires a high-end GPU with at least 24GB of RAM for optimal performance. The license is not specified, which may impact commercial adoption.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

LLMVoX by mbzuai-oryx

Autoregressive TTS model for streaming speech from any LLM

Created 10 months ago

Updated 8 months ago

Starred by

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

pi-card by nkasmanoff

Voice assistant for Raspberry Pi

Created 1 year ago

Updated 1 year ago

Starred by

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs),

Matt Schrage

Matt Schrage(Cofounder of Fig), and

2 more.

talk by yacineMTB

Local conversational engine demo with audio

Created 2 years ago

Updated 2 years ago

Vocalis by Lex-au

AI speech-to-speech assistant enabling natural, multimodal conversations

Created 9 months ago

Updated 9 months ago

RuntimeSpeechRecognizer by gtreshchev

Unreal Engine plugin for real-time, offline speech recognition

Created 2 years ago

Updated 10 months ago

ollama-voice by maudoin

Tool for local voice assistant using speech-to-text and LLM

Created 2 years ago

Updated 2 months ago

Starred by

Didier Lopes

Didier Lopes(Founder of OpenBB) and

Sindre Sorhus

Sindre Sorhus(Prolific OSS Developer).

awesome-whisper by sindresorhus

Curated list for OpenAI's Whisper ASR system

Created 2 years ago

Updated 2 months ago

10x by 0xCrunchyy

In-browser voice assistant for low-latency interaction

Created 2 years ago

Updated 6 days ago

whisper-ctranslate2 by Softcatala

CLI tool for faster Whisper transcription/translation

Created 2 years ago

Updated 4 weeks ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic).

swift by ai-ng

Voice assistant demo powered by Groq, Cartesia, and Vercel

Created 1 year ago

Updated 1 month ago

QuickAgent by gkamradt

Voice bot demo using speech and language models

Created 1 year ago

Updated 1 year ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

RealtimeVoiceChat by KoljaB

Real-time voice chat with AI using streaming audio

Created 8 months ago

Updated 6 months ago

Feedback? Help us improve.