xiaoniu  by agan-j

AI tool for multilingual video localization and content creation

Created 1 year ago
253 stars

Top 99.3% on SourcePulse

GitHubView on GitHub
Project Summary

"小牛视频翻译" (Xiaoniu Video Translator) is an AI-powered tool designed to automate video translation and localization for creators. It addresses the challenge of language barriers by enabling one-click translation of video speech and subtitles into multiple languages, generating new audio with precise lip-sync, and supporting YouTube video downloads. The tool aims to help creators efficiently produce multilingual content for platforms like TikTok and YouTube, expanding their global reach.

How It Works

The system utilizes a self-developed AI subtitle translation model, trained on a large dataset, for enhanced semantic understanding and contextual accuracy. Its core workflow follows a "5-step method": understanding video core, contextual translation, cultural adjustment, AI-driven reflection/correction, and final subtitle proofreading. Key functionalities include automatic speech recognition (ASR) for subtitle generation, multi-language subtitle translation, text-to-speech (TTS) synthesis with voice cloning and lip-sync synchronization, and voice separation. A web UI allows real-time editing of subtitles and audio.

Quick Start & Requirements

  • Installation: Downloadable "green" version requiring only extraction; no installation needed.
  • OS: Windows, macOS (Intel and Apple Silicon).
  • Hardware: CPU versions available. GPU versions require CUDA 12.7 or below; specific versions for RTX 50 series GPUs are offered.
  • Access: Run the executable and access the web UI via http://127.0.0.1:8181/home.
  • Dependencies: Model files can be downloaded separately or will download automatically upon first run. Links provided via Baidu Netdisk and Tianyi Cloud.

Highlighted Details

  • Achieves precise lip-sync and mouth-shape synchronization with translated audio.
  • Supports advanced voice cloning with over 100 cloned voices and multiple language options.
  • Integrates dual AI models (Gemma, Kimi+Gemma) for up to 98% translation accuracy.
  • Optimized for Apple Silicon Macs, offering up to 7x performance improvement.
  • Features robust ASR supporting 23 Chinese dialects with 99% accuracy.
  • Includes advanced audio processing for noise reduction and voice separation.

Maintenance & Community

  • The project's future open-source status is explicitly dependent on community feedback, encouraging user input via GitHub Issues.
  • Frequent updates demonstrate active development and responsiveness to user needs.

Licensing & Compatibility

  • The project's license is not specified in the provided README.
  • Compatibility for commercial use or linking with closed-source projects is undetermined.

Limitations & Caveats

The decision to open-source is contingent on community engagement, introducing uncertainty regarding its long-term availability. The absence of a specified license is a significant adoption blocker for users requiring clear legal terms.

Health Check
Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
14 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.