Chenyme-AAVT  by chenyme

All-in-one tool for media translation automation

created 1 year ago
2,558 stars

Top 18.7% on sourcepulse

GitHubView on GitHub
Project Summary

Chenyme-AAVT is an open-source project that automates the process of translating audio and video content. It targets content creators, educators, and individuals needing to localize media, offering a free, efficient, and customizable workflow.

How It Works

The project leverages Whisper for audio recognition, large language models (LLMs) like ChatGPT, Claude, Gemini, and DeepSeek for subtitle translation, and FFmpeg for video processing. It supports full localization and free deployment, enabling users to generate blog posts and marketing materials from videos, translate and modify subtitles, and preview changes. The architecture is designed for GPU acceleration and includes VAD (Voice Activity Detection) for optimized processing.

Quick Start & Requirements

  • Windows: Download the latest release, run 1_Install.bat, then 2_WebUI.bat. Requires Python (>3.8), FFmpeg (included in release), and optionally CUDA (11.8, 12.1, 12.4 recommended) for GPU acceleration.
  • macOS: Install Python (>3.8), Homebrew, and FFmpeg (brew install ffmpeg). Download the Mac release, navigate to the directory, and run pip3 install -r requirements.txt followed by streamlit run Chenyme-AAVT.py.
  • Docker: docker pull chenyme/chenyme-aavt:latest or use docker-compose. Access via <server_ip>:8501.
  • Colab: Available via a provided notebook.
  • Documentation: Installation tutorials and FAQs are linked.

Highlighted Details

  • Supports multiple LLM translation engines and local LLM integration.
  • Offers personalized subtitles, multiple subtitle formats, and preview/editing capabilities.
  • Includes features for generating blog content and marketing materials from videos.
  • Future plans include real-time translation, lip-sync correction, and voice cloning.

Maintenance & Community

The project is actively developed, with recent updates focusing on Whisper integration, LLM support, and video features. A Telegram group is available for community support and discussion.

Licensing & Compatibility

The project is licensed under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The project is under active development, with some features marked as "TODO" (e.g., voice identification, real-time translation, lip-sync correction). Users may encounter DLL missing errors on Windows, with provided workarounds. The README notes that update frequency may slow due to the developer's exam preparations.

Health Check
Last commit

3 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
217 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.