xiaoniu by agan-j

AI tool for multilingual video localization and content creation

Created 2 years ago

382 stars

Top 74.3% on SourcePulse

Project Summary

"小牛视频翻译" (Xiaoniu Video Translator) is an AI-powered tool designed to automate video translation and localization for creators. It addresses the challenge of language barriers by enabling one-click translation of video speech and subtitles into multiple languages, generating new audio with precise lip-sync, and supporting YouTube video downloads. The tool aims to help creators efficiently produce multilingual content for platforms like TikTok and YouTube, expanding their global reach.

How It Works

The system utilizes a self-developed AI subtitle translation model, trained on a large dataset, for enhanced semantic understanding and contextual accuracy. Its core workflow follows a "5-step method": understanding video core, contextual translation, cultural adjustment, AI-driven reflection/correction, and final subtitle proofreading. Key functionalities include automatic speech recognition (ASR) for subtitle generation, multi-language subtitle translation, text-to-speech (TTS) synthesis with voice cloning and lip-sync synchronization, and voice separation. A web UI allows real-time editing of subtitles and audio.

Quick Start & Requirements

Installation: Downloadable "green" version requiring only extraction; no installation needed.
OS: Windows, macOS (Intel and Apple Silicon).
Hardware: CPU versions available. GPU versions require CUDA 12.7 or below; specific versions for RTX 50 series GPUs are offered.
Access: Run the executable and access the web UI via http://127.0.0.1:8181/home.
Dependencies: Model files can be downloaded separately or will download automatically upon first run. Links provided via Baidu Netdisk and Tianyi Cloud.

Highlighted Details

Achieves precise lip-sync and mouth-shape synchronization with translated audio.
Supports advanced voice cloning with over 100 cloned voices and multiple language options.
Integrates dual AI models (Gemma, Kimi+Gemma) for up to 98% translation accuracy.
Optimized for Apple Silicon Macs, offering up to 7x performance improvement.
Features robust ASR supporting 23 Chinese dialects with 99% accuracy.
Includes advanced audio processing for noise reduction and voice separation.

Maintenance & Community

The project's future open-source status is explicitly dependent on community feedback, encouraging user input via GitHub Issues.
Frequent updates demonstrate active development and responsiveness to user needs.

Licensing & Compatibility

The project's license is not specified in the provided README.
Compatibility for commercial use or linking with closed-source projects is undetermined.

Limitations & Caveats

The decision to open-source is contingent on community engagement, introducing uncertainty regarding its long-term availability. The absence of a specified license is a significant adoption blocker for users requiring clear legal terms.

Health Check

Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

18 stars in the last 30 days