YouDub  by liuzhao1225

Open-source tool for translating/dubbing YouTube videos into Chinese

created 1 year ago
406 stars

Top 72.7% on sourcepulse

GitHubView on GitHub
Project Summary

YouDub is an open-source tool designed to automate the translation and dubbing of YouTube videos into Chinese, preserving the original speaker's voice. It targets content creators and consumers looking to localize high-quality video content for the Chinese internet. The primary benefit is the creation of Chinese-dubbed videos with the original YouTuber's vocal characteristics.

How It Works

YouDub leverages a pipeline of AI technologies. It uses OpenAI's Whisper for accurate speech-to-text conversion, followed by large language models (like GPT-3.5-turbo or GPT-4) for translating the transcribed text into Chinese. Finally, it employs AI voice cloning, currently using Paddle Speech, to generate Chinese audio that mimics the original speaker's tone and timbre. The system integrates these steps to ensure synchronized audio and video output.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt (ensure PyTorch is installed with appropriate CUDA version if needed, e.g., pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118).
  • Environment variables: Configure .env file with OPENAI_API_KEY, MODEL_NAME, HF_TOKEN (for speaker diarization), and optionally OPENAI_API_BASE or APPID/ACCESS_TOKEN for alternative TTS.
  • Run: python main.py --input_folders /path/to/input --output_folders /path/to/output [--diarize]
  • Prerequisites: Python, PyTorch (CPU or GPU), OpenAI API key, Hugging Face token (optional), potentially paid Volcano Engine TTS credentials.

Highlighted Details

  • Utilizes Whisper for speech recognition, with plans to evaluate WhisperX for improved performance.
  • Supports OpenAI API for translation and offers flexibility for integrating other LLMs.
  • Voice cloning currently uses Paddle Speech, with considerations for Coqui AI TTS.
  • Includes video processing for audio-video synchronization and accurate subtitle generation.
  • Optional --diarize flag enables speaker diarization using pyannote.

Maintenance & Community

  • Project welcomes contributions via GitHub Issues and Pull Requests.
  • Contact via GitHub Issues; a WeChat group is available via QR code in the README.

Licensing & Compatibility

  • Licensed under Apache License 2.0.
  • Users must comply with copyright, data protection, and privacy laws; redistribution requires original creator's permission.

Limitations & Caveats

The current AI voice cloning (Paddle Speech) cannot simultaneously generate Chinese and English within the same sentence. Some TTS options (Volcano Engine) may incur costs. Using gpt-4 for translation can be expensive.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
22 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.