pytvzhen  by CuSO4Gem

CLI tool for fast YouTube English video translation to Chinese

created 1 year ago
333 stars

Top 83.6% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a fast, end-to-end solution for translating English YouTube videos into Chinese, targeting content creators and researchers who need to localize video content efficiently. It significantly reduces manual effort by offering high-quality text translation and automated audio dubbing, aiming to minimize the 90% of time typically spent on manual text correction in other workflows.

How It Works

The system processes videos through a series of modular, serial steps, each producing intermediate files that can be inspected or reused. Key stages include downloading the video, extracting audio, separating vocals from background music using a provided model, transcribing English audio to text with faster-whisper, merging and translating the text (with a strong recommendation for DeepL), converting translated text to speech using edge-tts or GPT-SoVITS, and finally merging the synthesized audio with the original video. The serial nature allows users to start from any step and reuse specific components.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: NVIDIA GPU (CUDA required, CPU fallback possible via source modification), Python 3.9.19, ffmpeg (added to PATH), PyTorch (GPU version), faster-whisper models (download or self-managed).
  • Setup: Requires downloading models and configuring a paramDict.json file with YouTube video ID, work path, and optional proxy/API keys.
  • Docs: Demo Video

Highlighted Details

  • High-quality text translation via sentence merging and DeepL integration, reducing manual correction time.
  • Modular, serial execution allows for flexible workflow customization and debugging.
  • Utilizes faster-whisper for efficient English speech-to-text transcription.
  • Supports edge-tts for voice generation, with experimental GPT-SoVITS integration.

Maintenance & Community

  • Active development indicated by detailed workflow descriptions and parameter explanations.
  • Community Q&A group: 697357405.
  • Relies on several upstream open-source projects including pytube, ffmpeg, faster-whisper, edge-tts, and GPT-SoVITS.

Licensing & Compatibility

  • The README does not explicitly state a license. Upstream projects have various licenses (e.g., MIT, GPL). Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as having a lack of user-friendliness due to the author's preference for direct source modification over GUI development. Downloading faster-whisper models can be slow or blocked in some regions. The GPT-SoVITS TTS option is noted as potentially unstable. The final audio output might be truncated if the last subtitle segment is too short.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.