Discover and explore top open-source AI tools and projects—updated daily.
WhiskeyCoderAudiobook converter using advanced TTS and voice cloning
Top 71.9% on SourcePulse
This project addresses the need for automated, high-quality audiobook creation from various document formats. It targets users who wish to convert text-based files like PDFs, EPUBs, DOCX, DOC, and TXT into spoken-word audiobooks. The primary benefit is leveraging the advanced Qwen3 TTS voice model for natural speech generation and voice cloning, offering both pre-built high-quality narrators and the ability to clone custom voices.
How It Works
The converter extracts text from supported document types, then intelligently splits the text into manageable chunks of approximately 1200 words, ensuring sentence boundaries are respected. Each chunk is sent to a locally running Qwen3 TTS API (using the 1.7B model) for voice synthesis. The system tracks progress, caches processed chunks to avoid redundant work, and handles errors robustly before assembling the final audio file. This approach prioritizes quality and efficiency by utilizing a powerful TTS model and smart processing techniques.
Quick Start & Requirements
git clone https://github.com/WhiskeyCoder/Qwen3-Audiobook-Converter.gitcd Qwen3-Audiobook-Converterpip install -r requirements.txtpython audiobook_converter.py (default custom voice) or python audiobook_converter.py --voice-clone --voice-sample path/to/reference.wavhttp://127.0.0.1:7860).Highlighted Details
Maintenance & Community
Contributions are welcomed via standard Pull Requests. Support is available through GitHub Issues and Discussions. A roadmap outlines planned features like a GUI, chapter detection, and multiple output formats.
Licensing & Compatibility
This project is licensed under the MIT License, which is permissive and generally compatible with commercial use and closed-source linking.
Limitations & Caveats
The project requires a locally running Qwen TTS instance. Processing speed is approximately 4-5 minutes per chunk (1.7B model), meaning large documents will take significant time. Text extraction from image-based PDFs may require prior OCR. The MAX_WORKERS setting is fixed at 1 to prevent API rate limiting, thus processing is sequential. Some configuration settings are hardcoded within audiobook_converter.py and require manual editing for customization.
1 month ago
Inactive
WhisperSpeech