openlrc by zh-plus

Python library for audio transcription and translation to LRC files

Created 2 years ago

625 stars

Top 52.9% on SourcePulse

Project Summary

This Python library automates the transcription and translation of audio files into LRC subtitle format, targeting users who need to create multilingual subtitles for podcasts, audiobooks, or other spoken content. It leverages faster-whisper for transcription and various LLMs (OpenAI, Anthropic, Gemini) for translation and polishing, offering context-aware translation and support for custom glossaries to enhance accuracy.

How It Works

The system processes audio files using faster-whisper for transcription, with optional audio preprocessing like loudness normalization and noise suppression to minimize transcription errors. The transcribed text is then translated into a target language using an LLM, with options for context-aware translation and the inclusion of custom glossaries for domain-specific terminology. The output is formatted as an LRC file, with support for bilingual subtitles.

Quick Start & Requirements

Install via pip: pip install openlrc
Install faster-whisper from source: pip install "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/8327d8cc647266ed66f6cd878cf97eccface7351.tar.gz"
Install PyTorch with CUDA 11.x/12.x support: pip install --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
Requires CUDA 11.x and cuDNN 8 for faster-whisper.
API keys for LLM providers (OpenAI, Anthropic, Google) can be set as environment variables.
ffmpeg must be installed and in the system's PATH.
Official Documentation: https://github.com/zh-plus/openlrc

Highlighted Details

Supports multiple LLM providers including OpenAI (GPT), Anthropic (Claude), and Google (Gemini).
Allows custom endpoints for LLM providers and routing models to specific SDKs.
Offers bilingual subtitle generation and glossary support for improved translation accuracy.
Includes options for audio preprocessing like loudness normalization and noise suppression.

Maintenance & Community

The project is actively maintained with recent updates in May-September 2024. It references several key open-source projects in its credits.

Licensing & Compatibility

The project appears to be under a permissive license, but specific details are not explicitly stated in the README. Compatibility with commercial or closed-source applications would require verification of the underlying dependencies' licenses.

Limitations & Caveats

The project depends on a specific, un-PyPI published commit of faster-whisper. Installation requires manual setup of CUDA and cuDNN libraries, which can be complex. Some advanced features like local LLM support or voice-music separation are still in the "Todo" list.

openlrc by zh-plus

Explore Similar Projects

Translate-It by iSegaro

whispering by Sharrnah

pytvzhen by CuSO4Gem

babelfish.ai by supabase-community

AudioToText by Carleslc

Modelscope_Faster_Whisper_Multi_Subtitle by v3ucn

Speech-Translate by Dadangdut33

generate-subtitles by mayeaux

writeout.ai by beyondcode

noScribe by kaixxx

seamless_communication by facebookresearch

pyvideotrans by jianchang512