Auto-Synced-Translated-Dubs by ThioJoe

CLI tool for auto-synced, translated video dubs

Created 3 years ago

1,703 stars

Top 24.7% on SourcePulse

Project Summary

This project provides an automated workflow for translating and dubbing video content using AI. It targets content creators and distributors looking to localize videos efficiently, enabling them to reach a wider audience by generating synchronized, translated audio tracks from existing subtitle files.

How It Works

The core process leverages subtitle timings (SRT files) to orchestrate AI-driven translation and text-to-speech synthesis. It first translates the subtitle text using services like Google Cloud or DeepL. Then, it calculates the precise duration for each translated speech segment based on the original subtitle timings. Finally, it synthesizes new audio clips using neural voices and adjusts their length to match the original speech, ensuring perfect lip-sync. An optional "two-pass" synthesis method can improve audio quality by recalculating speech speed before synthesis.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
External requirements: ffmpeg (required), rubberband (optional, for audio stretching).
API access and configuration: Google Cloud, Microsoft Azure, and/or DeepL API keys/tokens must be set up in cloud_service_settings.ini.
Configuration: Customize settings in config.ini and language/voice preferences in batch.ini.
Run: python main.py
More info: Wiki

Highlighted Details

Supports multiple AI services for translation (Google Translate, DeepL) and TTS (Azure, Google Cloud, Eleven Labs).
Includes additional tools for managing YouTube video metadata (titles, descriptions, audio tracks) and subtitle files.
Offers fine-grained control over translation and synthesis, including "Don't Translate" lists and custom pronunciation.
Batch processing allows for sequential translation and dubbing into multiple languages.

Maintenance & Community

The project appears to be actively maintained by ThioJoe. Further community engagement details (Discord, Slack, roadmap) are not explicitly mentioned in the README.

Licensing & Compatibility

The repository's license is not specified in the provided README. Users should verify licensing for commercial use and integration with closed-source projects.

Limitations & Caveats

The current implementation assumes a single speaker per video. For multi-speaker content, separate SRT files for each speaker are recommended, requiring manual merging of generated audio tracks. The quality of pauses between lines is dependent on the source SRT file's formatting; SRTs generated with specific settings (e.g., from Descript) are preferred. The "two-pass" synthesis feature, while improving quality, doubles API costs.

Health Check

Last Commit

16 hours ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

10 stars in the last 30 days