TTS-Story  by Xerophayze

Multi-voice audiobook generation studio

Created 6 months ago
250 stars

Top 100.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

TTS-Story is a web-based, multi-voice TTS studio designed for transforming tagged scripts into audiobooks. It offers a flexible backend supporting numerous local GPU and cloud API TTS engines, including Kokoro, Chatterbox, and VoxCPM. Key benefits include advanced speaker management, voice cloning, a job queue, library system, and efficient M4B audiobook export, catering to users requiring professional audiobook production tools.

How It Works

The application integrates twelve TTS engines (e.g., Kokoro, Chatterbox, VoxCPM, Pocket-TTS, KittenTTS, IndexTTS) via a unified web interface, allowing users to choose between local GPU inference (NVIDIA CUDA required) or cloud API backends (Replicate). It parses speaker tags for multi-voice generation, handles text chunking, supports voice cloning from short audio prompts, and features parallel processing for faster M4B audiobook exports. This architecture provides flexibility, cost control, and privacy options.

Quick Start & Requirements

  • Installation: Recommended: Install-Update.bat (Windows) automates Python environment setup, PyTorch with CUDA, and dependencies. Manual setup requires espeak-ng, Rubber Band CLI, Python virtual environment, PyTorch installation (specific CUDA versions), and pip install -r requirements.txt.
  • Prerequisites: Python 3.9+, NVIDIA GPU with CUDA (optional, for local inference), Internet connection. Specific CUDA versions (12.x, 11.8) supported.
  • Running: Execute run.bat (Windows) or python app.py. Access via http://localhost:5000.
  • Links: Local scripts (Install-Update.bat, run.bat).

Highlighted Details

  • Multi-Engine Flexibility: Integrates twelve TTS engines with local GPU and Replicate API options.
  • Advanced Voice Cloning: Supports cloning voices using Chatterbox, VoxCPM, Qwen3 TTS, and IndexTTS from 10-15 second audio prompts.
  • M4B Audiobook Export: Generates M4B files with chapter markers, cover art, and configurable bitrate, optimized by parallel AAC encoding.
  • AI Text Pre-processing: Leverages Gemini API for script tidying, speaker tag consistency, and content transformation.
  • Performance: Kokoro local GPU achieves ~2s/chunk (500 words); parallel processing significantly speeds up audiobook merging.

Maintenance & Community

No explicit community channels (Discord/Slack), sponsorships, or roadmap details were found. Development is supported via voluntary contributions.

Licensing & Compatibility

Licensed under Apache 2.0, consistent with the Kokoro-82M engine. This license permits commercial use and integration into closed-source projects.

Limitations & Caveats

Local GPU inference may encounter "CUDA out of memory" errors, requiring chunk_size reduction or API fallback. IndexTTS runs in an isolated environment to mitigate dependency conflicts. The primary installation script is Windows-specific.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
20 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.