open-dubbing by Softcatala

AI dubbing system for videos

Created 1 year ago

345 stars

Top 80.3% on SourcePulse

Project Summary

Open Dubbing is an experimental command-line AI system for automatically translating and synchronizing video dialogue into different languages. It's designed for users interested in understanding and experimenting with the integration of Speech-to-Text (STT), Text-to-Speech (TTS), and machine translation technologies for video localization.

How It Works

This system orchestrates a pipeline of open-source models for STT (Whisper), translation (NLLB-200, Apertium API), and TTS (Coqui, MMS, Edge, OpenAI). It supports automatic source language detection and offers configurable voice gender assignment for synthetic voices. The approach leverages established models to provide a flexible, locally runnable dubbing solution.

Quick Start & Requirements

Install: pip install open_dubbing (add [coqui] or [openai] for specific TTS support).
Prerequisites: ffmpeg (system-wide installation required for Linux, macOS, Windows). espeak-ng is needed for Coqui-TTS on Linux/macOS. Hugging Face token required for model access. Pyannote.audio user conditions must be accepted.
Usage: open-dubbing --input_file video.mp4 --target_language=cat --hugging_face_token=TOKEN
Documentation: https://www.softcatala.org/doblatge/

Highlighted Details

Supports multiple TTS engines: Coqui, MMS, Edge, OpenAI TTS.
Integrates Whisper for automatic source language detection.
Utilizes NLLB-200 for machine translation.
Allows post-editing of intermediate files for fine-tuning dubbing.

Maintenance & Community

The project is developed by Softcatalà. Contact: Jordi Mas (jmas@softcatala.org).

Licensing & Compatibility

The project appears to be under a permissive license, but core libraries used (like pyannote.audio) may have their own terms. Commercial use should be verified against all dependencies.

Limitations & Caveats

This is an experimental project, and errors can occur at any stage of the pipeline (speech recognition, translation, TTS). Language support is dependent on the specific combination of TTS, translation, and STT models used.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

17 stars in the last 30 days