Gradio web UI for video translation with synchronized audio
Top 33.5% on sourcepulse
SoniTranslate is a web application for translating videos into multiple languages with synchronized audio. It targets users who need to localize video content, offering a user-friendly Gradio interface for easy operation. The project aims to simplify the video translation workflow, making it accessible to a broader audience.
How It Works
SoniTranslate utilizes a pipeline that first transcribes the original audio using models like WhisperX or faster-whisper. It then translates the transcribed text using services like deep-translator or OpenAI's GPT API. Finally, it synthesizes the translated text into speech using various Text-to-Speech (TTS) engines, including Piper TTS, Coqui XTTS, and OpenVoiceV2, and synchronizes this new audio with the original video. This modular approach allows for flexibility in choosing transcription, translation, and TTS components.
Quick Start & Requirements
requirements_base.txt
and requirements_extra.txt
.python app_rvc.py
.Highlighted Details
Maintenance & Community
The project is actively updated, with recent changes including OpenAI API integration, new output formats, and expanded language support. Community contributions are welcomed via issues and pull requests.
Licensing & Compatibility
The code is licensed under Apache 2.0. However, the README notes that models or weights, such as those from pyannote-audio
, may have commercial restrictions. Users should verify model licenses for commercial use.
Limitations & Caveats
While the code is Apache 2.0 licensed, the use of certain models (e.g., pyannote
for diarization) may impose commercial restrictions. Compatibility with all websites for YouTube playlist processing is not guaranteed.
1 month ago
1 day