SoniTranslate by R3gm

Gradio web UI for video translation with synchronized audio

Created 2 years ago

1,330 stars

Top 29.8% on SourcePulse

Project Summary

SoniTranslate is a web application for translating videos into multiple languages with synchronized audio. It targets users who need to localize video content, offering a user-friendly Gradio interface for easy operation. The project aims to simplify the video translation workflow, making it accessible to a broader audience.

How It Works

SoniTranslate utilizes a pipeline that first transcribes the original audio using models like WhisperX or faster-whisper. It then translates the transcribed text using services like deep-translator or OpenAI's GPT API. Finally, it synthesizes the translated text into speech using various Text-to-Speech (TTS) engines, including Piper TTS, Coqui XTTS, and OpenVoiceV2, and synchronizes this new audio with the original video. This modular approach allows for flexibility in choosing transcription, translation, and TTS components.

Quick Start & Requirements

Installation: Requires setting up a Conda environment, installing PyTorch with CUDA 11.8, cloning the repository, and installing dependencies via requirements_base.txt and requirements_extra.txt.
Prerequisites: NVIDIA drivers with CUDA 11.8.0, Anaconda/Miniconda, Git, FFmpeg, and a Hugging Face account with an access token (with read access to gated repos).
Running: Activate the Conda environment and run python app_rvc.py.
Resources: Requires a GPU for optimal performance.
Links: Colab Notebook, Repository, Online DEMO, Video Tutorial

Highlighted Details

Supports over 100 languages for translation and transcription.
Offers multiple TTS options including voice cloning with OpenVoiceV2 and Coqui XTTS.
Features advanced options like subtitle generation (SRT, ASS), batch processing, and CPU mode.
Integrates OpenAI API for enhanced transcription, translation, and TTS capabilities.

Maintenance & Community

The project is actively updated, with recent changes including OpenAI API integration, new output formats, and expanded language support. Community contributions are welcomed via issues and pull requests.

Licensing & Compatibility

The code is licensed under Apache 2.0. However, the README notes that models or weights, such as those from pyannote-audio, may have commercial restrictions. Users should verify model licenses for commercial use.

Limitations & Caveats

While the code is Apache 2.0 licensed, the use of certain models (e.g., pyannote for diarization) may impose commercial restrictions. Compatibility with all websites for YouTube playlist processing is not guaranteed.

Health Check

Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

18 stars in the last 30 days