Faster-Whisper-TransWithAI-ChickenRice  by TransWithAI

Optimized Japanese-to-Chinese audio/video transcription and translation

Created 3 months ago
822 stars

Top 43.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary This project offers a high-performance tool for audio/video transcription and translation, specifically optimized for Japanese-to-Chinese conversion. It targets users needing accurate transcription, providing GPU acceleration and cloud processing benefits. The solution integrates Faster Whisper, a specialized translation model, and an optimized voice activity detection (VAD) module.

How It Works The system utilizes SYSTRAN/faster-whisper for efficient inference, incorporating a voice-optimized VAD model (TransWithAI/Whisper-Vad-EncDec-ASMR-onnx) for improved speech segmentation. Its core innovation is the "ChickenRice v2" model, fine-tuned on 5000 hours of audio data, delivering high-accuracy Japanese-to-Chinese translation. The architecture supports GPU acceleration via CUDA and offers cloud inference via the Modal platform, removing local hardware dependencies.

Quick Start & Requirements Download pre-built packages from Releases. Operation involves dragging media files onto batch scripts (.bat) for GPU, low-VRAM GPU, CPU, or video translation modes. Prerequisites include an NVIDIA GPU with compatible CUDA versions (11.8, 12.2, or 12.8). Cloud inference via Modal requires Python, the modal library, and a Modal account. Supported formats include common audio (mp3, wav, flac) and video (mp4, mkv, avi) files. Detailed usage guides are in the local "使用说明" document. Key components include Faster Whisper (https://github.com/SYSTRAN/faster-whisper), the ChickenRice model (https://huggingface.co/chickenrice0721/whisper-large-v2-translate-zh-v0.2-st), the VAD model (https://huggingface.co/TransWithAI/Whisper-Vad-EncDec-ASMR-onnx), OpenAI Whisper (https://github.com/openai/whisper), and Modal (https://modal.com/).

Highlighted Details

  • Specialized "ChickenRice v2" model for highly accurate Japanese-to-Chinese translation.
  • GPU acceleration leveraging CUDA 11.8/12.2/12.8 for NVIDIA hardware.
  • Optional cloud inference via Modal, enabling use without local GPUs.
  • Support for multiple subtitle formats (SRT, VTT, LRC) and various media file types.
  • Intelligent caching mechanism to speed up batch processing.

Maintenance & Community Developed by AI汉化组 (AI Localization Group), accessible via Telegram (https://t.me/transWithAI). Sponsorship inquiries are welcomed via GitHub Issues. Acknowledges anonymous group member contributions.

Licensing & Compatibility Released under the permissive MIT License. Described as completely free and open-source, with no explicit restrictions noted for commercial use or integration into closed-source projects.

Limitations & Caveats Optimal performance requires specific NVIDIA GPUs and compatible CUDA installations. Modal cloud inference incurs costs beyond the free tier. Users must select the correct pre-built package (Base vs. ChickenRice Edition). Potential issues with excessively long subtitle durations may need manual configuration adjustments.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
10
Star History
529 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.