Speech-Translate  by Dadangdut33

Speech-to-text app using Whisper for transcription and translation

Created 2 years ago
609 stars

Top 53.9% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a real-time speech transcription and translation application, leveraging OpenAI's Whisper and free translation APIs. It's designed for users needing live speech-to-text, speech translation, or batch audio/video file processing, offering a user-friendly Tkinter interface.

How It Works

The application integrates OpenAI's Whisper ASR model for accurate speech-to-text and utilizes free translation APIs for language conversion. It supports live microphone input and batch processing of audio/video files, outputting transcriptions and translations in various formats (.txt, .srt, .vtt, etc.). A customizable subtitle window is available for live outputs.

Quick Start & Requirements

  • Installation:
    • Prebuilt Binary (.exe): Download from releases. Requires CUDA 11.8 compatible GPU.
    • As a Module: pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git --extra-index-url https://download.pytorch.org/whl/cu118 (GPU) or pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git (CPU). Run with speech-translate.
    • From Git: Clone repo, set up virtual environment, pip install -r requirements.txt (add --extra-index-url for GPU), run Run.py.
  • Prerequisites: Python 3.8+ (3.11 recommended). GPU with CUDA compatibility recommended for performance. Windows 8+ for speaker input (or use loopback tools). Internet connection required for API translation and model downloads. Noto Emoji font recommended for UI.
  • Resources: Whisper models range from ~39MB (tiny) to 1.5GB (large), requiring VRAM from ~1GB to 10GB+.
  • Docs: Wiki

Highlighted Details

  • Supports live transcription and translation from microphone input.
  • Batch processing for audio/video files with multiple output formats.
  • Customizable subtitle window for real-time display.
  • Option to integrate local LibreTranslate for offline use.

Maintenance & Community

Licensing & Compatibility

  • MIT License. Permissive for commercial use and closed-source linking.

Limitations & Caveats

  • Prebuilt binaries are Windows-only and require CUDA 11.8.
  • Speaker input is Windows 8+ specific; alternative audio capture methods are needed for other OS.
  • Build script is currently only configured for Windows.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.