LiberSonora by LiberSonora

AI audiobook toolkit for intelligent subtitle extraction, AI title generation, and translation

Created 1 year ago

458 stars

Top 66.0% on SourcePulse

Project Summary

LiberSonora is an AI-powered, open-source toolkit for audiobook processing, offering features like intelligent subtitle extraction, AI-driven title generation, and multilingual translation. It targets audiobook enthusiasts and developers seeking to automate and enhance their audiobook experience, providing local, offline processing with GPU acceleration.

How It Works

The toolkit leverages a modular architecture with distinct services for UI (Streamlit), audio denoising (ClearerVoice-Studio), and speech recognition/subtitle generation (FunASR). It integrates with various large language models (like Qwen2.5, MiniCPM) via Ollama for AI tasks, enabling local, private inference. This approach allows for flexible customization, including the use of custom LLMs, and ensures data security through entirely offline operation.

Quick Start & Requirements

Install/Run: Clone the repository and run docker-compose -f docker-compose.gpu.yml up -d.
Prerequisites: Requires an NVIDIA GPU. Docker and docker-compose are necessary.
Setup Time: Estimated 15 minutes for dependency installation, plus model download time (typically under 10 minutes).
Links: Project website and documentation: https://libersonora.github.io/

Highlighted Details

MIT License: Fully open-source and free for commercial use.
Local & Offline: All audio processing and LLM inference run locally, ensuring data privacy.
API Support: Exposes API endpoints for integration into personal workflows.
Batch Processing: Supports batch processing of audio files.
GPU Acceleration: Utilizes NVIDIA GPUs for enhanced performance.

Maintenance & Community

The project is actively developed, with a roadmap outlining future phases including a cross-platform audiobook player. Feedback is encouraged via GitHub Issues.

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

The project currently requires an NVIDIA GPU due to dependencies on ClearerVoice and FunASR; CPU support is a low-priority consideration. Some music players exhibit compatibility issues with the generated multilingual subtitles.

Health Check

Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days