LiberSonora  by LiberSonora

AI audiobook toolkit for intelligent subtitle extraction, AI title generation, and translation

created 7 months ago
433 stars

Top 69.8% on sourcepulse

GitHubView on GitHub
Project Summary

LiberSonora is an AI-powered, open-source toolkit for audiobook processing, offering features like intelligent subtitle extraction, AI-driven title generation, and multilingual translation. It targets audiobook enthusiasts and developers seeking to automate and enhance their audiobook experience, providing local, offline processing with GPU acceleration.

How It Works

The toolkit leverages a modular architecture with distinct services for UI (Streamlit), audio denoising (ClearerVoice-Studio), and speech recognition/subtitle generation (FunASR). It integrates with various large language models (like Qwen2.5, MiniCPM) via Ollama for AI tasks, enabling local, private inference. This approach allows for flexible customization, including the use of custom LLMs, and ensures data security through entirely offline operation.

Quick Start & Requirements

  • Install/Run: Clone the repository and run docker-compose -f docker-compose.gpu.yml up -d.
  • Prerequisites: Requires an NVIDIA GPU. Docker and docker-compose are necessary.
  • Setup Time: Estimated 15 minutes for dependency installation, plus model download time (typically under 10 minutes).
  • Links: Project website and documentation: https://libersonora.github.io/

Highlighted Details

  • MIT License: Fully open-source and free for commercial use.
  • Local & Offline: All audio processing and LLM inference run locally, ensuring data privacy.
  • API Support: Exposes API endpoints for integration into personal workflows.
  • Batch Processing: Supports batch processing of audio files.
  • GPU Acceleration: Utilizes NVIDIA GPUs for enhanced performance.

Maintenance & Community

The project is actively developed, with a roadmap outlining future phases including a cross-platform audiobook player. Feedback is encouraged via GitHub Issues.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

The project currently requires an NVIDIA GPU due to dependencies on ClearerVoice and FunASR; CPU support is a low-priority consideration. Some music players exhibit compatibility issues with the generated multilingual subtitles.

Health Check
Last commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
1
Star History
40 stars in the last 90 days

Explore Similar Projects

Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

ultravox by fixie-ai

0.4%
4k
Multimodal LLM for real-time voice interactions
created 1 year ago
updated 4 days ago
Feedback? Help us improve.