WhisperFusion  by collabora

AI pipeline for real-time conversations

created 1 year ago
1,620 stars

Top 26.5% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

WhisperFusion enables seamless, ultra-low latency conversations with AI by integrating a Large Language Model (LLM) with a real-time speech-to-text pipeline. It targets users seeking highly responsive AI interaction, leveraging optimized TensorRT engines for both Whisper and the LLM.

How It Works

WhisperFusion utilizes OpenAI's WhisperLive for real-time speech-to-text and integrates Mistral, an LLM, for enhanced understanding. Both Whisper and the LLM are optimized as TensorRT engines for high-performance, low-latency processing. WhisperSpeech further benefits from torch.compile for faster inference via JIT-compiled PyTorch kernels. This combination aims for maximum efficiency in conversational AI applications.

Quick Start & Requirements

  • Install/Run: docker compose build followed by docker compose up.
  • Prerequisites: GPU with at least 24GB RAM (RTX 4090 equivalent FP16 TFLOPS recommended for optimal latency). Requires Nvidia TensorRT-LLM.
  • Setup: Docker Compose setup includes pre-built TensorRT engines for Whisper and Phi, and a pre-downloaded WhisperSpeech model. A web GUI is available at http://localhost:8000.

Highlighted Details

  • Real-time speech-to-text via WhisperLive.
  • Integration of Mistral LLM for enhanced context.
  • TensorRT optimization for both Whisper and LLM for high performance.
  • torch.compile used for WhisperSpeech inference speedup.
  • Supports multiple GPUs via TensorRT-LLM for potential performance gains.

Maintenance & Community

Contact points are provided via email: marcus.edel@collabora.com, jpc@collabora.com, vineet.suryan@collabora.com. Issues can be opened directly on the repository.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Requires a high-end GPU with at least 24GB of RAM for optimal performance. The license is not specified, which may impact commercial adoption.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
26 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.