All-in-one tool for media translation automation
Top 18.7% on sourcepulse
Chenyme-AAVT is an open-source project that automates the process of translating audio and video content. It targets content creators, educators, and individuals needing to localize media, offering a free, efficient, and customizable workflow.
How It Works
The project leverages Whisper for audio recognition, large language models (LLMs) like ChatGPT, Claude, Gemini, and DeepSeek for subtitle translation, and FFmpeg for video processing. It supports full localization and free deployment, enabling users to generate blog posts and marketing materials from videos, translate and modify subtitles, and preview changes. The architecture is designed for GPU acceleration and includes VAD (Voice Activity Detection) for optimized processing.
Quick Start & Requirements
1_Install.bat
, then 2_WebUI.bat
. Requires Python (>3.8), FFmpeg (included in release), and optionally CUDA (11.8, 12.1, 12.4 recommended) for GPU acceleration.brew install ffmpeg
). Download the Mac release, navigate to the directory, and run pip3 install -r requirements.txt
followed by streamlit run Chenyme-AAVT.py
.docker pull chenyme/chenyme-aavt:latest
or use docker-compose
. Access via <server_ip>:8501
.Highlighted Details
Maintenance & Community
The project is actively developed, with recent updates focusing on Whisper integration, LLM support, and video features. A Telegram group is available for community support and discussion.
Licensing & Compatibility
The project is licensed under the MIT license, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
The project is under active development, with some features marked as "TODO" (e.g., voice identification, real-time translation, lip-sync correction). Users may encounter DLL missing errors on Windows, with provided workarounds. The README notes that update frequency may slow due to the developer's exam preparations.
3 months ago
1+ week