Discover and explore top open-source AI tools and projects—updated daily.
Async acceleration for CosyVoice2 inference
Top 70.4% on SourcePulse
Async CosyVoice accelerates CosyVoice2 inference for Linux users by leveraging vLLM for LLM acceleration and multi-estimator instances for the Flow component. This results in significantly reduced inference times (RTF from 0.25-0.30 to 0.1-0.15) and improved concurrency, making it suitable for researchers and developers needing faster text-to-speech generation.
How It Works
The project integrates vLLM to speed up the Large Language Model (LLM) portion of CosyVoice2 inference. The "Flow" component utilizes official load_jit
or load_trt
modes, enhanced by multiple estimator instances provided by hexisyztem. This hybrid approach aims to maximize throughput and minimize latency by optimizing both the language generation and the acoustic modeling stages.
Quick Start & Requirements
async_cosyvoice
within it. Install dependencies via pip install -r requirements.txt
in the async_cosyvoice
directory.vllm==0.7.3
, torch==2.5.1
, onnxruntime-gpu==1.19.0
. Requires downloading CosyVoice2 - 0.5B model files.config.py
.Highlighted Details
spk2info.pt
for faster inference by skipping prompt audio processing.Maintenance & Community
Licensing & Compatibility
async_cosyvoice
. It depends on the licensing of the original CosyVoice project and vLLM.Limitations & Caveats
This project is Linux-only and requires specific versions of CUDA, PyTorch, and vLLM. It also mandates a "warm-up" period of over 10 inference tasks to preheat the TRT model.
4 months ago
Inactive