whispering-ui  by Sharrnah

Native UI for live audio transcription/translation

created 2 years ago
280 stars

Top 94.0% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a native UI for the Whispering Tiger application, a tool for real-time audio transcription and translation. It targets users who need to integrate live speech-to-text and translation into various applications like streaming overlays or VRChat, offering a user-friendly interface for configuration and control.

How It Works

The UI acts as a control layer for the Whispering Tiger backend, managing audio input capture (including loopback audio for system sounds), AI model selection (for speech-to-text and translation), and output configuration via WebSockets or OSC. It supports GPU acceleration via CUDA for NVIDIA GPUs, allowing users to balance accuracy and performance by selecting AI model sizes and precision levels, with automatic model downloads.

Quick Start & Requirements

  • Download the latest release from the Releases Page.
  • Extract to a folder on a drive with sufficient free space.
  • Run Whispering Tiger.exe.
  • Optional but recommended: Install CUDA for NVIDIA GPU acceleration.
  • Initial run downloads the Whispering Tiger platform and AI models.
  • Setup involves creating a profile, selecting audio devices, and configuring AI model parameters.

Highlighted Details

  • Native UI for Windows, with potential future Linux support.
  • Supports transcription/translation of audio streams and in-game images.
  • Integrated Text-to-Speech (TTS) with Silero and F5 support.
  • Plugin architecture for extended functionality (e.g., Realtime Subtitles, RVC).
  • Loopback audio capture for system audio without extra tools.
  • Auto-update functionality for the Whispering Tiger backend.

Maintenance & Community

  • Project has a Discord server for additional help.

Licensing & Compatibility

  • The README does not explicitly state the license for whispering-ui. The linked whispering repository is MIT licensed.

Limitations & Caveats

  • Currently Windows-focused, with Linux support pending.
  • AI model download status is not displayed during the initial setup.
  • Memory consumption estimates are rough and can vary.
Health Check
Last commit

15 hours ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
22 stars in the last 90 days

Explore Similar Projects

Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

ultravox by fixie-ai

0.4%
4k
Multimodal LLM for real-time voice interactions
created 1 year ago
updated 12 hours ago
Feedback? Help us improve.