openlrc  by zh-plus

Python library for audio transcription and translation to LRC files

created 2 years ago
573 stars

Top 57.1% on sourcepulse

GitHubView on GitHub
Project Summary

This Python library automates the transcription and translation of audio files into LRC subtitle format, targeting users who need to create multilingual subtitles for podcasts, audiobooks, or other spoken content. It leverages faster-whisper for transcription and various LLMs (OpenAI, Anthropic, Gemini) for translation and polishing, offering context-aware translation and support for custom glossaries to enhance accuracy.

How It Works

The system processes audio files using faster-whisper for transcription, with optional audio preprocessing like loudness normalization and noise suppression to minimize transcription errors. The transcribed text is then translated into a target language using an LLM, with options for context-aware translation and the inclusion of custom glossaries for domain-specific terminology. The output is formatted as an LRC file, with support for bilingual subtitles.

Quick Start & Requirements

  • Install via pip: pip install openlrc
  • Install faster-whisper from source: pip install "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/8327d8cc647266ed66f6cd878cf97eccface7351.tar.gz"
  • Install PyTorch with CUDA 11.x/12.x support: pip install --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
  • Requires CUDA 11.x and cuDNN 8 for faster-whisper.
  • API keys for LLM providers (OpenAI, Anthropic, Google) can be set as environment variables.
  • ffmpeg must be installed and in the system's PATH.
  • Official Documentation: https://github.com/zh-plus/openlrc

Highlighted Details

  • Supports multiple LLM providers including OpenAI (GPT), Anthropic (Claude), and Google (Gemini).
  • Allows custom endpoints for LLM providers and routing models to specific SDKs.
  • Offers bilingual subtitle generation and glossary support for improved translation accuracy.
  • Includes options for audio preprocessing like loudness normalization and noise suppression.

Maintenance & Community

The project is actively maintained with recent updates in May-September 2024. It references several key open-source projects in its credits.

Licensing & Compatibility

The project appears to be under a permissive license, but specific details are not explicitly stated in the README. Compatibility with commercial or closed-source applications would require verification of the underlying dependencies' licenses.

Limitations & Caveats

The project depends on a specific, un-PyPI published commit of faster-whisper. Installation requires manual setup of CUDA and cuDNN libraries, which can be complex. Some advanced features like local LLM support or voice-music separation are still in the "Todo" list.

Health Check
Last commit

5 days ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
34 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.