Linly-Dubbing by Kedreamix

AI dubbing/translation tool for multi-language video content creation

Created 1 year ago

2,896 stars

Top 16.4% on SourcePulse

Project Summary

Linly-Dubbing is an AI-powered tool for multi-language video dubbing and translation, targeting content creators and businesses aiming for global reach. It automates the process of localizing video content, offering enhanced naturalness and accuracy through advanced AI integrations.

How It Works

Linly-Dubbing orchestrates a pipeline of specialized AI models. It begins with optional video download via yt-dlp, followed by vocal separation using models like Demucs or UVR5. Speech is transcribed using WhisperX or FunASR, then translated via LLMs such as OpenAI or Qwen. Finally, AI speech synthesis is performed using options like XTTS, CosyVoice, or GPT-SoVITS, with an optional digital human lip-sync layer inspired by Linly-Talker for visual synchronization.

Quick Start & Requirements

Installation: Clone repository, initialize submodules, create a conda environment (Python 3.10), install dependencies (requirements.txt, requirements_module.txt), and install ffmpeg.
Prerequisites: PyTorch (2.3.1) with CUDA (11.8 or 12.1), pynini, yt-dlp. Optional: pyannote/speaker-diarization-3.1 for speaker diarization.
Configuration: Requires .env file with API keys (OpenAI, Hugging Face, Baidu) and model names.
Execution: Run scripts/download_models.sh (Linux) or scripts/modelscope_download.py (Windows), then python webui.py.
Resources: Installation is noted as "very slow."
Online Demo: Available via Linly-Dubbing Colab.

Highlighted Details

Integrates multiple state-of-the-art AI models for each stage of the dubbing pipeline.
Supports advanced features like AI voice cloning and digital human lip-sync.
Offers flexibility with various LLM and TTS engine choices.
Includes a WebUI for simplified operation.

Maintenance & Community

The project is hosted on GitHub by Kedreamix. Links to related projects like Linly-Talker are provided.

Licensing & Compatibility

Licensed under the Apache License 2.0. Users are cautioned to comply with copyright, data protection, and privacy laws, and to obtain necessary permissions before use.

Limitations & Caveats

The installation process is described as very slow. Some advanced features like speaker diarization require explicit access requests. The README notes that large model performance can be limited, recommending more powerful APIs or models.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

84 stars in the last 30 days