ZipVoice by k2-fsa

Fast zero-shot TTS with flow matching

Created 8 months ago

854 stars

Top 41.7% on SourcePulse

Project Summary

ZipVoice offers fast, high-quality zero-shot text-to-speech (TTS) and spoken dialogue generation using flow matching. It targets researchers and developers needing efficient, natural-sounding voice cloning and multi-speaker conversations, supporting both Chinese and English.

How It Works

ZipVoice leverages flow matching, a generative modeling technique, to achieve state-of-the-art TTS performance. This approach allows for efficient, non-autoregressive synthesis, resulting in faster inference speeds compared to traditional autoregressive models. The models are small, with the base version having only 123M parameters, making them suitable for deployment in resource-constrained environments.

Quick Start & Requirements

Install via pip install -r requirements.txt after cloning the repository.
Optional: Install k2 for training and faster inference, ensuring compatibility with PyTorch and CUDA versions (e.g., pip install k2==1.24.4.dev20250208+cuda12.1.torch2.5.1 -f https://k2-fsa.github.io/k2/cuda.html).
Inference commands are provided for single-speaker and dialogue generation.
See k2-fsa.org/get-started/k2/ for k2 installation details.

Highlighted Details

Supports zero-shot single-speaker TTS and multi-channel spoken dialogue generation.
Offers a distilled model (ZipVoice-Distill) for improved speed with minimal performance loss.
Includes specialized models for dialogue generation (ZipVoice-Dialog, ZipVoice-Dialog-Stereo).
Allows manual correction of Chinese polyphone pronunciations using special tokens.

Maintenance & Community

The project is actively developed by k2-fsa, with recent releases of dialogue models and the OpenDialog dataset. Discussions can be held via GitHub Issues.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The README does not specify any explicit limitations or known bugs. Installation of k2 requires careful attention to PyTorch and CUDA version compatibility.

ZipVoice by k2-fsa

Explore Similar Projects

DailyTalk by keonlee9420

MGM-Omni by JIA-Lab-research

Meta-voicebox by SpeechifyInc

ComfyUI-Qwen-TTS by flybirdxx

GLM-TTS by zai-org

FireRedTTS2 by FireRedTeam

parler-tts by huggingface

KittenTTS by KittenML

Spark-TTS by SparkAudio

dia by nari-labs

CosyVoice by FunAudioLLM

ChatTTS by 2noise