Discover and explore top open-source AI tools and projects—updated daily.
mangodxdAutomated YouTube video dubbing and subtitling pipeline
Top 96.1% on SourcePulse
This project provides an AI-powered pipeline for automatically dubbing and subtitling YouTube videos, targeting users who need a free, local, and customizable solution. It addresses the limitations of existing tools by offering full control over the output, running entirely on the user's machine.
How It Works
The pipeline orchestrates several open-source components: YouTube videos are downloaded via yt-dlp, transcribed using Whisper ASR with timestamped output, and then segmented into natural speech chunks. These segments are translated using Google Translate (RPC or scraping fallback) and synthesized into speech via Edge TTS. Finally, the synthesized audio is mixed with the original video, and subtitles are burned or the audio track is replaced using FFmpeg for rendering. This approach prioritizes local execution and cost-effectiveness.
Quick Start & Requirements
git clone https://github.com/mangodxd/youtube-auto-dub.git), cd into the directory, and run pip install -r requirements.txt.pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118.python main.py "YOUTUBE_URL" with optional flags for language, mode (subtitles, dubbing, or both), and voice gender.Highlighted Details
language_map.json.Maintenance & Community
Issues and Pull Requests are actively accepted. Contribution guidelines are provided, particularly for adding new languages and voices. No specific community channels (e.g., Discord, Slack) are listed.
Licensing & Compatibility
The project is released under the MIT License, permitting commercial use and modification without significant restrictions.
Limitations & Caveats
Users must ensure FFmpeg is correctly added to their system PATH. CUDA memory issues may necessitate using smaller Whisper models. Potential issues with translation splitting, YouTube rate limits, or TTS voice availability are documented with workarounds. Advanced features like speaker diarization, background music separation, and voice conversion are under active development.
1 week ago
Inactive