AI pipeline for real-time conversations
Top 26.5% on sourcepulse
WhisperFusion enables seamless, ultra-low latency conversations with AI by integrating a Large Language Model (LLM) with a real-time speech-to-text pipeline. It targets users seeking highly responsive AI interaction, leveraging optimized TensorRT engines for both Whisper and the LLM.
How It Works
WhisperFusion utilizes OpenAI's WhisperLive for real-time speech-to-text and integrates Mistral, an LLM, for enhanced understanding. Both Whisper and the LLM are optimized as TensorRT engines for high-performance, low-latency processing. WhisperSpeech further benefits from torch.compile
for faster inference via JIT-compiled PyTorch kernels. This combination aims for maximum efficiency in conversational AI applications.
Quick Start & Requirements
docker compose build
followed by docker compose up
.http://localhost:8000
.Highlighted Details
torch.compile
used for WhisperSpeech inference speedup.Maintenance & Community
Contact points are provided via email: marcus.edel@collabora.com, jpc@collabora.com, vineet.suryan@collabora.com. Issues can be opened directly on the repository.
Licensing & Compatibility
The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
Requires a high-end GPU with at least 24GB of RAM for optimal performance. The license is not specified, which may impact commercial adoption.
1 year ago
1 week