youtube-auto-dub  by mangodxd

Automated YouTube video dubbing and subtitling pipeline

Created 3 months ago
266 stars

Top 96.1% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides an AI-powered pipeline for automatically dubbing and subtitling YouTube videos, targeting users who need a free, local, and customizable solution. It addresses the limitations of existing tools by offering full control over the output, running entirely on the user's machine.

How It Works

The pipeline orchestrates several open-source components: YouTube videos are downloaded via yt-dlp, transcribed using Whisper ASR with timestamped output, and then segmented into natural speech chunks. These segments are translated using Google Translate (RPC or scraping fallback) and synthesized into speech via Edge TTS. Finally, the synthesized audio is mixed with the original video, and subtitles are burned or the audio track is replaced using FFmpeg for rendering. This approach prioritizes local execution and cost-effectiveness.

Quick Start & Requirements

  • Install: Clone the repository (git clone https://github.com/mangodxd/youtube-auto-dub.git), cd into the directory, and run pip install -r requirements.txt.
  • Prerequisites: Python 3.8+, FFmpeg installed and available in the system PATH.
  • GPU Support: For faster Whisper inference, install PyTorch with CUDA support: pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118.
  • Usage: Execute python main.py "YOUTUBE_URL" with optional flags for language, mode (subtitles, dubbing, or both), and voice gender.
  • Docs: https://github.com/mangodxd/youtube-auto-dub

Highlighted Details

  • Fully automated pipeline from YouTube URL to dubbed/subtitled video.
  • Supports custom language and voice mapping via language_map.json.
  • Handles age-restricted or private videos by leveraging browser cookies.
  • Offers flexibility in choosing Whisper model size for performance/accuracy trade-offs.

Maintenance & Community

Issues and Pull Requests are actively accepted. Contribution guidelines are provided, particularly for adding new languages and voices. No specific community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

The project is released under the MIT License, permitting commercial use and modification without significant restrictions.

Limitations & Caveats

Users must ensure FFmpeg is correctly added to their system PATH. CUDA memory issues may necessitate using smaller Whisper models. Potential issues with translation splitting, YouTube rate limits, or TTS voice availability are documented with workarounds. Advanced features like speaker diarization, background music separation, and voice conversion are under active development.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
57 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.