Qwen3-Audiobook-Converter by WhiskeyCoder

Audiobook converter using advanced TTS and voice cloning

Created 5 months ago

862 stars

Top 40.9% on SourcePulse

Project Summary

This project addresses the need for automated, high-quality audiobook creation from various document formats. It targets users who wish to convert text-based files like PDFs, EPUBs, DOCX, DOC, and TXT into spoken-word audiobooks. The primary benefit is leveraging the advanced Qwen3 TTS voice model for natural speech generation and voice cloning, offering both pre-built high-quality narrators and the ability to clone custom voices.

How It Works

The converter extracts text from supported document types, then intelligently splits the text into manageable chunks of approximately 1200 words, ensuring sentence boundaries are respected. Each chunk is sent to a locally running Qwen3 TTS API (using the 1.7B model) for voice synthesis. The system tracks progress, caches processed chunks to avoid redundant work, and handles errors robustly before assembling the final audio file. This approach prioritizes quality and efficiency by utilizing a powerful TTS model and smart processing techniques.

Quick Start & Requirements

Primary install/run command:
1. Clone the repository: git clone https://github.com/WhiskeyCoder/Qwen3-Audiobook-Converter.git
2. Navigate to the directory: cd Qwen3-Audiobook-Converter
3. Install dependencies: pip install -r requirements.txt
4. Run converter: python audiobook_converter.py (default custom voice) or python audiobook_converter.py --voice-clone --voice-sample path/to/reference.wav
Non-default prerequisites:
- Qwen Voice Model running locally with Gradio API enabled (accessible at http://127.0.0.1:7860).
- Python 3.8+.
- FFmpeg installed and accessible in the system PATH.
Estimated setup time: Moderate, primarily dependent on setting up the Qwen TTS environment and installing FFmpeg.
Relevant pages: GitHub Repository

Highlighted Details

Dual Voice Modes: Supports pre-built, optimized narration speakers (e.g., Ryan, Serena) and a voice cloning mode that uses a reference audio sample.
Multi-Format Support: Handles TXT, PDF, EPUB, DOCX, and DOC files.
Smart Chunking & Caching: Intelligent text splitting with sentence boundary detection and caching of processed chunks to improve efficiency.
Always 1.7B Model: Utilizes the highest quality Qwen model for synthesis.
Robust Error Handling: Includes automatic retries and cleanup of temporary files.

Maintenance & Community

Contributions are welcomed via standard Pull Requests. Support is available through GitHub Issues and Discussions. A roadmap outlines planned features like a GUI, chapter detection, and multiple output formats.

Licensing & Compatibility

This project is licensed under the MIT License, which is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The project requires a locally running Qwen TTS instance. Processing speed is approximately 4-5 minutes per chunk (1.7B model), meaning large documents will take significant time. Text extraction from image-based PDFs may require prior OCR. The MAX_WORKERS setting is fixed at 1 to prevent API rate limiting, thus processing is sequential. Some configuration settings are hardcoded within audiobook_converter.py and require manual editing for customization.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

300 stars in the last 30 days