Discover and explore top open-source AI tools and projects—updated daily.
meizhong986Advanced ASR for specialized Japanese audio
Top 43.5% on SourcePulse
A subtitle generation tool specifically designed to overcome the significant performance degradation of standard ASR models like Whisper when applied to Japanese Adult Videos (JAV). It addresses the unique challenges of JAV audio, including low SNR, non-verbal vocalizations, spectral mimicry, linguistic variance, and temporal drift, offering improved accuracy and reduced hallucinations for this niche domain. The project targets users requiring high-quality subtitles for JAV content and researchers interested in ASR for noisy, specialized audio.
How It Works
WhisperJAV employs a multi-stage inference pipeline that tackles JAV's specific acoustic and linguistic characteristics. Key strategies include Acoustic Filtering via scene-based segmentation and Voice Activity Detection (VAD) clamping to process coherent audio segments, Linguistic Adaptation to normalize domain-specific terminology and correct dialect-induced tokenization errors, and Defensive Decoding which tunes log-probability thresholds and employs regex filters to systematically discard low-confidence outputs and non-lexical markers, thereby mitigating hallucinations.
Quick Start & Requirements
WhisperJAV-1.7.4-Windows-x86_64.exe). Alternatively, install from source using provided scripts (install_windows.bat, install_linux.sh, install.py) which auto-detect GPUs and CUDA versions. Manual pip installation is also supported.Highlighted Details
faster, fast, balanced (default), fidelity, and transformers (utilizing a Japanese-optimized model).conservative, balanced, aggressive to control hallucination thresholds.transformers + balanced) for potentially enhanced accuracy.auditok (default), silero, and semantic methods.Maintenance & Community
No specific details regarding maintainers, community channels (e.g., Discord/Slack), or sponsorship were found in the provided README.
Licensing & Compatibility
Limitations & Caveats
Python versions 3.13 and above are incompatible. AMD GPU (ROCm) support is experimental, and CPU-only processing is notably slow. The tool generates subtitles for accessibility, and users bear responsibility for adhering to relevant laws.
1 day ago
Inactive
RVC-Boss