Speech-Translate  by Dadangdut33

Speech-to-text app using Whisper for transcription and translation

created 2 years ago
599 stars

Top 55.3% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a real-time speech transcription and translation application, leveraging OpenAI's Whisper and free translation APIs. It's designed for users needing live speech-to-text, speech translation, or batch audio/video file processing, offering a user-friendly Tkinter interface.

How It Works

The application integrates OpenAI's Whisper ASR model for accurate speech-to-text and utilizes free translation APIs for language conversion. It supports live microphone input and batch processing of audio/video files, outputting transcriptions and translations in various formats (.txt, .srt, .vtt, etc.). A customizable subtitle window is available for live outputs.

Quick Start & Requirements

  • Installation:
    • Prebuilt Binary (.exe): Download from releases. Requires CUDA 11.8 compatible GPU.
    • As a Module: pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git --extra-index-url https://download.pytorch.org/whl/cu118 (GPU) or pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git (CPU). Run with speech-translate.
    • From Git: Clone repo, set up virtual environment, pip install -r requirements.txt (add --extra-index-url for GPU), run Run.py.
  • Prerequisites: Python 3.8+ (3.11 recommended). GPU with CUDA compatibility recommended for performance. Windows 8+ for speaker input (or use loopback tools). Internet connection required for API translation and model downloads. Noto Emoji font recommended for UI.
  • Resources: Whisper models range from ~39MB (tiny) to 1.5GB (large), requiring VRAM from ~1GB to 10GB+.
  • Docs: Wiki

Highlighted Details

  • Supports live transcription and translation from microphone input.
  • Batch processing for audio/video files with multiple output formats.
  • Customizable subtitle window for real-time display.
  • Option to integrate local LibreTranslate for offline use.

Maintenance & Community

Licensing & Compatibility

  • MIT License. Permissive for commercial use and closed-source linking.

Limitations & Caveats

  • Prebuilt binaries are Windows-only and require CUDA 11.8.
  • Speaker input is Windows 8+ specific; alternative audio capture methods are needed for other OS.
  • Build script is currently only configured for Windows.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
28 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Travis Fischer Travis Fischer(Founder of Agentic).

RealtimeSTT by KoljaB

0.9%
8k
Speech-to-text library for realtime applications
created 1 year ago
updated 3 weeks ago
Feedback? Help us improve.