Discover and explore top open-source AI tools and projects—updated daily.
TransWithAIOptimized Japanese-to-Chinese audio/video transcription and translation
Top 43.0% on SourcePulse
Summary This project offers a high-performance tool for audio/video transcription and translation, specifically optimized for Japanese-to-Chinese conversion. It targets users needing accurate transcription, providing GPU acceleration and cloud processing benefits. The solution integrates Faster Whisper, a specialized translation model, and an optimized voice activity detection (VAD) module.
How It Works The system utilizes SYSTRAN/faster-whisper for efficient inference, incorporating a voice-optimized VAD model (TransWithAI/Whisper-Vad-EncDec-ASMR-onnx) for improved speech segmentation. Its core innovation is the "ChickenRice v2" model, fine-tuned on 5000 hours of audio data, delivering high-accuracy Japanese-to-Chinese translation. The architecture supports GPU acceleration via CUDA and offers cloud inference via the Modal platform, removing local hardware dependencies.
Quick Start & Requirements
Download pre-built packages from Releases. Operation involves dragging media files onto batch scripts (.bat) for GPU, low-VRAM GPU, CPU, or video translation modes. Prerequisites include an NVIDIA GPU with compatible CUDA versions (11.8, 12.2, or 12.8). Cloud inference via Modal requires Python, the modal library, and a Modal account. Supported formats include common audio (mp3, wav, flac) and video (mp4, mkv, avi) files. Detailed usage guides are in the local "使用说明" document. Key components include Faster Whisper (https://github.com/SYSTRAN/faster-whisper), the ChickenRice model (https://huggingface.co/chickenrice0721/whisper-large-v2-translate-zh-v0.2-st), the VAD model (https://huggingface.co/TransWithAI/Whisper-Vad-EncDec-ASMR-onnx), OpenAI Whisper (https://github.com/openai/whisper), and Modal (https://modal.com/).
Highlighted Details
Maintenance & Community
Developed by AI汉化组 (AI Localization Group), accessible via Telegram (https://t.me/transWithAI). Sponsorship inquiries are welcomed via GitHub Issues. Acknowledges anonymous group member contributions.
Licensing & Compatibility Released under the permissive MIT License. Described as completely free and open-source, with no explicit restrictions noted for commercial use or integration into closed-source projects.
Limitations & Caveats Optimal performance requires specific NVIDIA GPUs and compatible CUDA installations. Modal cloud inference incurs costs beyond the free tier. Users must select the correct pre-built package (Base vs. ChickenRice Edition). Potential issues with excessively long subtitle durations may need manual configuration adjustments.
1 month ago
Inactive