Discover and explore top open-source AI tools and projects—updated daily.
randombkvLLM-accelerated TTS generation
Top 78.4% on SourcePulse
This project ports the Chatterbox Text-to-Speech (TTS) model to the vLLM inference engine, targeting users seeking significantly improved performance and GPU memory efficiency for high-throughput TTS generation. It offers basic speech cloning with audio and text conditioning, featuring controllable exaggeration and Context Free Guidance (CFG).
How It Works
The port leverages vLLM's optimized inference capabilities to overcome the CPU-GPU synchronization bottlenecks of the original Hugging Face Transformers implementation. It achieves substantial speedups by integrating Chatterbox's two-stage generation process (T3 Llama token generation and S3Gen waveform generation) within vLLM's PagedAttention mechanism. This allows for more efficient GPU utilization and batching.
Quick Start & Requirements
uv or pip: git clone https://github.com/randombk/chatterbox-vllm.git, cd chatterbox-vllm, uv venv, source .venv/bin/activate, uv sync.git, uv. Requires a compatible vLLM version (tested with 0.9.2).python example-tts.pybenchmark.pyHighlighted Details
Maintenance & Community
This is a personal project. Updates and discussions regarding vLLM limitations can be followed at https://github.com/vllm-project/vllm/issues/21989.
Licensing & Compatibility
The repository does not explicitly state a license. The original Chatterbox project is Apache 2.0 licensed. Compatibility for commercial or closed-source use is not specified.
Limitations & Caveats
The project relies on internal vLLM APIs and "hacky workarounds," making it dependent on specific vLLM versions (tested with 0.9.2) and potentially unstable. Learned speech positional embeddings are not yet applied, though quality degradation is reportedly minimal. Server API is out of scope, and APIs are not yet stable.
2 months ago
Inactive
WhisperSpeech