CLI tool for fast YouTube English video translation to Chinese
Top 83.6% on sourcepulse
This project provides a fast, end-to-end solution for translating English YouTube videos into Chinese, targeting content creators and researchers who need to localize video content efficiently. It significantly reduces manual effort by offering high-quality text translation and automated audio dubbing, aiming to minimize the 90% of time typically spent on manual text correction in other workflows.
How It Works
The system processes videos through a series of modular, serial steps, each producing intermediate files that can be inspected or reused. Key stages include downloading the video, extracting audio, separating vocals from background music using a provided model, transcribing English audio to text with faster-whisper
, merging and translating the text (with a strong recommendation for DeepL), converting translated text to speech using edge-tts
or GPT-SoVITS
, and finally merging the synthesized audio with the original video. The serial nature allows users to start from any step and reuse specific components.
Quick Start & Requirements
pip install -r requirements.txt
ffmpeg
(added to PATH), PyTorch (GPU version), faster-whisper
models (download or self-managed).paramDict.json
file with YouTube video ID, work path, and optional proxy/API keys.Highlighted Details
faster-whisper
for efficient English speech-to-text transcription.edge-tts
for voice generation, with experimental GPT-SoVITS
integration.Maintenance & Community
pytube
, ffmpeg
, faster-whisper
, edge-tts
, and GPT-SoVITS
.Licensing & Compatibility
Limitations & Caveats
The project is described as having a lack of user-friendliness due to the author's preference for direct source modification over GUI development. Downloading faster-whisper
models can be slow or blocked in some regions. The GPT-SoVITS
TTS option is noted as potentially unstable. The final audio output might be truncated if the last subtitle segment is too short.
1 year ago
1 day