whispering  by Sharrnah

Live transcription/translation tool with OSC and Websocket support

created 2 years ago
456 stars

Top 67.3% on sourcepulse

GitHubView on GitHub
Project Summary

Whispering Tiger is an open-source, locally-run tool for real-time speech-to-text transcription and translation, as well as optical character recognition (OCR) and text-to-speech (TTS). It targets streamers, VRChat users, and developers needing live audio/visual processing, offering integration via WebSockets and OSC for overlays and in-app use.

How It Works

The project leverages multiple state-of-the-art AI models for its core functionalities. For speech processing, it supports OpenAI's Whisper, Meta's Seamless M4T, Microsoft's Speech T5, and NVIDIA's NeMo Canary, enabling transcription and translation across numerous languages. OCR is handled by EasyOCR and Microsoft's Phi-4 Multimodal LLM, capturing text from screen images. TTS capabilities are provided by Silero F5/E2-TTS, Kokoro TTS, and Zonos TTS, with voice cloning support. The architecture is designed for local execution, minimizing latency and privacy concerns once models are downloaded.

Quick Start & Requirements

  • Installation: Download standalone releases from the GitHub Releases page.
  • Prerequisites: CUDA for GPU acceleration is recommended. Models can consume up to 20 GB of disk space.
  • Usage: Execute provided .bat files (e.g., start-transcribe-mic.bat) and configure parameters via text editor or command-line flags. A native UI application is available at https://github.com/Sharrnah/whispering-ui for easier management.

Highlighted Details

  • Supports 98 languages for transcription and up to 200 for translation via models like NLLB-200 and Seamless M4T.
  • Integrates OCR for in-game text capture and translation.
  • Features TTS with voice cloning and RVC (Retrieval-based Voice Conversion).
  • Includes LLM integration for text continuation and Q&A via plugins.

Maintenance & Community

The project acknowledges contributions from OpenAI, Meta, Microsoft, and others. Community links are not explicitly provided in the README.

Licensing & Compatibility

The project's licensing is not explicitly stated in the provided README text.

Limitations & Caveats

Initial model downloads can be substantial (up to 20 GB). The README mentions a 2 GB limit on GitHub releases, necessitating downloads from external links. Some LLM integrations are noted as proof-of-concept.

Health Check
Last commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.