Discover and explore top open-source AI tools and projects—updated daily.
qi-huaAsync acceleration for CosyVoice2 inference
Top 67.7% on SourcePulse
Async CosyVoice accelerates CosyVoice2 inference for Linux users by leveraging vLLM for LLM acceleration and multi-estimator instances for the Flow component. This results in significantly reduced inference times (RTF from 0.25-0.30 to 0.1-0.15) and improved concurrency, making it suitable for researchers and developers needing faster text-to-speech generation.
How It Works
The project integrates vLLM to speed up the Large Language Model (LLM) portion of CosyVoice2 inference. The "Flow" component utilizes official load_jit or load_trt modes, enhanced by multiple estimator instances provided by hexisyztem. This hybrid approach aims to maximize throughput and minimize latency by optimizing both the language generation and the acoustic modeling stages.
Quick Start & Requirements
async_cosyvoice within it. Install dependencies via pip install -r requirements.txt in the async_cosyvoice directory.vllm==0.7.3, torch==2.5.1, onnxruntime-gpu==1.19.0. Requires downloading CosyVoice2 - 0.5B model files.config.py.Highlighted Details
spk2info.pt for faster inference by skipping prompt audio processing.Maintenance & Community
Licensing & Compatibility
async_cosyvoice. It depends on the licensing of the original CosyVoice project and vLLM.Limitations & Caveats
This project is Linux-only and requires specific versions of CUDA, PyTorch, and vLLM. It also mandates a "warm-up" period of over 10 inference tasks to preheat the TRT model.
6 months ago
Inactive
triton-inference-server
b4rtaz
Lightning-AI
sanchit-gandhi
ai-dynamo