Python library for audio transcription and translation to LRC files
Top 57.1% on sourcepulse
This Python library automates the transcription and translation of audio files into LRC subtitle format, targeting users who need to create multilingual subtitles for podcasts, audiobooks, or other spoken content. It leverages faster-whisper for transcription and various LLMs (OpenAI, Anthropic, Gemini) for translation and polishing, offering context-aware translation and support for custom glossaries to enhance accuracy.
How It Works
The system processes audio files using faster-whisper for transcription, with optional audio preprocessing like loudness normalization and noise suppression to minimize transcription errors. The transcribed text is then translated into a target language using an LLM, with options for context-aware translation and the inclusion of custom glossaries for domain-specific terminology. The output is formatted as an LRC file, with support for bilingual subtitles.
Quick Start & Requirements
pip install openlrc
pip install "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/8327d8cc647266ed66f6cd878cf97eccface7351.tar.gz"
pip install --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
Highlighted Details
Maintenance & Community
The project is actively maintained with recent updates in May-September 2024. It references several key open-source projects in its credits.
Licensing & Compatibility
The project appears to be under a permissive license, but specific details are not explicitly stated in the README. Compatibility with commercial or closed-source applications would require verification of the underlying dependencies' licenses.
Limitations & Caveats
The project depends on a specific, un-PyPI published commit of faster-whisper
. Installation requires manual setup of CUDA and cuDNN libraries, which can be complex. Some advanced features like local LLM support or voice-music separation are still in the "Todo" list.
5 days ago
1 day