pytvzhen by CuSO4Gem

CLI tool for fast YouTube English video translation to Chinese

Created 1 year ago

346 stars

Top 80.2% on SourcePulse

Project Summary

This project provides a fast, end-to-end solution for translating English YouTube videos into Chinese, targeting content creators and researchers who need to localize video content efficiently. It significantly reduces manual effort by offering high-quality text translation and automated audio dubbing, aiming to minimize the 90% of time typically spent on manual text correction in other workflows.

How It Works

The system processes videos through a series of modular, serial steps, each producing intermediate files that can be inspected or reused. Key stages include downloading the video, extracting audio, separating vocals from background music using a provided model, transcribing English audio to text with faster-whisper, merging and translating the text (with a strong recommendation for DeepL), converting translated text to speech using edge-tts or GPT-SoVITS, and finally merging the synthesized audio with the original video. The serial nature allows users to start from any step and reuse specific components.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: NVIDIA GPU (CUDA required, CPU fallback possible via source modification), Python 3.9.19, ffmpeg (added to PATH), PyTorch (GPU version), faster-whisper models (download or self-managed).
Setup: Requires downloading models and configuring a paramDict.json file with YouTube video ID, work path, and optional proxy/API keys.
Docs: Demo Video

Highlighted Details

High-quality text translation via sentence merging and DeepL integration, reducing manual correction time.
Modular, serial execution allows for flexible workflow customization and debugging.
Utilizes faster-whisper for efficient English speech-to-text transcription.
Supports edge-tts for voice generation, with experimental GPT-SoVITS integration.

Maintenance & Community

Active development indicated by detailed workflow descriptions and parameter explanations.
Community Q&A group: 697357405.
Relies on several upstream open-source projects including pytube, ffmpeg, faster-whisper, edge-tts, and GPT-SoVITS.

Licensing & Compatibility

The README does not explicitly state a license. Upstream projects have various licenses (e.g., MIT, GPL). Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as having a lack of user-friendliness due to the author's preference for direct source modification over GUI development. Downloading faster-whisper models can be slow or blocked in some regions. The GPT-SoVITS TTS option is noted as potentially unstable. The final audio output might be truncated if the last subtitle segment is too short.

pytvzhen by CuSO4Gem

Explore Similar Projects

openlrc by zh-plus

YouDub by liuzhao1225

Modelscope_Faster_Whisper_Multi_Subtitle by v3ucn

Speech-Translate by Dadangdut33

generate-subtitles by mayeaux

yt-whisper by m1guelpf

SoniTranslate by R3gm

Linly-Dubbing by Kedreamix

voice-pro by abus-aikorea

VideoCaptioner by WEIFENG2333

seamless_communication by facebookresearch

pyvideotrans by jianchang512