Linly-Dubbing  by Kedreamix

AI dubbing/translation tool for multi-language video content creation

created 11 months ago
2,590 stars

Top 18.5% on sourcepulse

GitHubView on GitHub
Project Summary

Linly-Dubbing is an AI-powered tool for multi-language video dubbing and translation, targeting content creators and businesses aiming for global reach. It automates the process of localizing video content, offering enhanced naturalness and accuracy through advanced AI integrations.

How It Works

Linly-Dubbing orchestrates a pipeline of specialized AI models. It begins with optional video download via yt-dlp, followed by vocal separation using models like Demucs or UVR5. Speech is transcribed using WhisperX or FunASR, then translated via LLMs such as OpenAI or Qwen. Finally, AI speech synthesis is performed using options like XTTS, CosyVoice, or GPT-SoVITS, with an optional digital human lip-sync layer inspired by Linly-Talker for visual synchronization.

Quick Start & Requirements

  • Installation: Clone repository, initialize submodules, create a conda environment (Python 3.10), install dependencies (requirements.txt, requirements_module.txt), and install ffmpeg.
  • Prerequisites: PyTorch (2.3.1) with CUDA (11.8 or 12.1), pynini, yt-dlp. Optional: pyannote/speaker-diarization-3.1 for speaker diarization.
  • Configuration: Requires .env file with API keys (OpenAI, Hugging Face, Baidu) and model names.
  • Execution: Run scripts/download_models.sh (Linux) or scripts/modelscope_download.py (Windows), then python webui.py.
  • Resources: Installation is noted as "very slow."
  • Online Demo: Available via Linly-Dubbing Colab.

Highlighted Details

  • Integrates multiple state-of-the-art AI models for each stage of the dubbing pipeline.
  • Supports advanced features like AI voice cloning and digital human lip-sync.
  • Offers flexibility with various LLM and TTS engine choices.
  • Includes a WebUI for simplified operation.

Maintenance & Community

The project is hosted on GitHub by Kedreamix. Links to related projects like Linly-Talker are provided.

Licensing & Compatibility

Licensed under the Apache License 2.0. Users are cautioned to comply with copyright, data protection, and privacy laws, and to obtain necessary permissions before use.

Limitations & Caveats

The installation process is described as very slow. Some advanced features like speaker diarization require explicit access requests. The README notes that large model performance can be limited, recommending more powerful APIs or models.

Health Check
Last commit

5 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
240 stars in the last 90 days

Explore Similar Projects

Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

ultravox by fixie-ai

0.4%
4k
Multimodal LLM for real-time voice interactions
created 1 year ago
updated 4 days ago
Feedback? Help us improve.