Accelerated TTS inference with vLLM
Top 80.3% on sourcepulse
This project enhances IndexTTS, a text-to-speech system, by integrating vLLM for significantly faster inference. It targets researchers and developers needing high-throughput TTS capabilities, offering substantial speedups and improved concurrency for GPT model decoding.
How It Works
The project leverages vLLM's optimized inference engine to accelerate the GPT model component of IndexTTS. This approach utilizes techniques like paged attention and continuous batching to maximize GPU utilization and throughput, resulting in a ~3x reduction in Real-Time Factor (RTF) and a ~3x increase in GPT decoding speed compared to the original IndexTTS.
Quick Start & Requirements
conda create -n index-tts-vllm python=3.12
), activate it, and install dependencies (pip install -r requirements.txt
).bash convert_hf_format.sh /path/to/your/model_dir
, and then run VLLM_USE_V1=0 python webui.py
or VLLM_USE_V1=0 python api_server.py --model_dir /your/path/to/Index-TTS
. Initial startup may involve CUDA kernel compilation for BigVGAN.Highlighted Details
Maintenance & Community
No specific community channels or notable contributors are mentioned in the README.
Licensing & Compatibility
The project's licensing is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The README notes that using multiple reference audio inputs for multi-character mixing can lead to unstable output声线. The VLLM_USE_V1=0
flag is mandatory for running the provided scripts, indicating potential compatibility issues with vLLM v1.
2 weeks ago
Inactive