Faster-Whisper-TransWithAI-ChickenRice by TransWithAI

Optimized Japanese-to-Chinese audio/video transcription and translation

Created 8 months ago

1,900 stars

Top 22.2% on SourcePulse

Project Summary

Summary This project offers a high-performance tool for audio/video transcription and translation, specifically optimized for Japanese-to-Chinese conversion. It targets users needing accurate transcription, providing GPU acceleration and cloud processing benefits. The solution integrates Faster Whisper, a specialized translation model, and an optimized voice activity detection (VAD) module.

How It Works The system utilizes SYSTRAN/faster-whisper for efficient inference, incorporating a voice-optimized VAD model (TransWithAI/Whisper-Vad-EncDec-ASMR-onnx) for improved speech segmentation. Its core innovation is the "ChickenRice v2" model, fine-tuned on 5000 hours of audio data, delivering high-accuracy Japanese-to-Chinese translation. The architecture supports GPU acceleration via CUDA and offers cloud inference via the Modal platform, removing local hardware dependencies.

Quick Start & Requirements Download pre-built packages from Releases. Operation involves dragging media files onto batch scripts (.bat) for GPU, low-VRAM GPU, CPU, or video translation modes. Prerequisites include an NVIDIA GPU with compatible CUDA versions (11.8, 12.2, or 12.8). Cloud inference via Modal requires Python, the modal library, and a Modal account. Supported formats include common audio (mp3, wav, flac) and video (mp4, mkv, avi) files. Detailed usage guides are in the local "使用说明" document. Key components include Faster Whisper (https://github.com/SYSTRAN/faster-whisper), the ChickenRice model (https://huggingface.co/chickenrice0721/whisper-large-v2-translate-zh-v0.2-st), the VAD model (https://huggingface.co/TransWithAI/Whisper-Vad-EncDec-ASMR-onnx), OpenAI Whisper (https://github.com/openai/whisper), and Modal (https://modal.com/).

Highlighted Details

Specialized "ChickenRice v2" model for highly accurate Japanese-to-Chinese translation.
GPU acceleration leveraging CUDA 11.8/12.2/12.8 for NVIDIA hardware.
Optional cloud inference via Modal, enabling use without local GPUs.
Support for multiple subtitle formats (SRT, VTT, LRC) and various media file types.
Intelligent caching mechanism to speed up batch processing.

Maintenance & Community Developed by AI汉化组 (AI Localization Group), accessible via Telegram (https://t.me/transWithAI). Sponsorship inquiries are welcomed via GitHub Issues. Acknowledges anonymous group member contributions.

Licensing & Compatibility Released under the permissive MIT License. Described as completely free and open-source, with no explicit restrictions noted for commercial use or integration into closed-source projects.

Limitations & Caveats Optimal performance requires specific NVIDIA GPUs and compatible CUDA installations. Modal cloud inference incurs costs beyond the free tier. Users must select the correct pre-built package (Base vs. ChickenRice Edition). Potential issues with excessively long subtitle durations may need manual configuration adjustments.

Faster-Whisper-TransWithAI-ChickenRice by TransWithAI

Explore Similar Projects

yt-transcriber by pmarreck

pytvzhen by CuSO4Gem

mazinger by bakrianoo

MioSub by corvo007

YouDub by liuzhao1225

Speech-Translate by Dadangdut33

SimulStreaming by ufal

xiaohu-video-translate by xiaohuailabs

Auto-Synced-Translated-Dubs by ThioJoe

xtts-webui by daswer123

SoniTranslate by R3gm

VideoCaptioner by WEIFENG2333