ZipVoice  by k2-fsa

Fast zero-shot TTS with flow matching

created 1 month ago
336 stars

Top 83.0% on sourcepulse

GitHubView on GitHub
Project Summary

ZipVoice offers fast, high-quality zero-shot text-to-speech (TTS) and spoken dialogue generation using flow matching. It targets researchers and developers needing efficient, natural-sounding voice cloning and multi-speaker conversations, supporting both Chinese and English.

How It Works

ZipVoice leverages flow matching, a generative modeling technique, to achieve state-of-the-art TTS performance. This approach allows for efficient, non-autoregressive synthesis, resulting in faster inference speeds compared to traditional autoregressive models. The models are small, with the base version having only 123M parameters, making them suitable for deployment in resource-constrained environments.

Quick Start & Requirements

  • Install via pip install -r requirements.txt after cloning the repository.
  • Optional: Install k2 for training and faster inference, ensuring compatibility with PyTorch and CUDA versions (e.g., pip install k2==1.24.4.dev20250208+cuda12.1.torch2.5.1 -f https://k2-fsa.github.io/k2/cuda.html).
  • Inference commands are provided for single-speaker and dialogue generation.
  • See k2-fsa.org/get-started/k2/ for k2 installation details.

Highlighted Details

  • Supports zero-shot single-speaker TTS and multi-channel spoken dialogue generation.
  • Offers a distilled model (ZipVoice-Distill) for improved speed with minimal performance loss.
  • Includes specialized models for dialogue generation (ZipVoice-Dialog, ZipVoice-Dialog-Stereo).
  • Allows manual correction of Chinese polyphone pronunciations using special tokens.

Maintenance & Community

The project is actively developed by k2-fsa, with recent releases of dialogue models and the OpenDialog dataset. Discussions can be held via GitHub Issues.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The README does not specify any explicit limitations or known bugs. Installation of k2 requires careful attention to PyTorch and CUDA version compatibility.

Health Check
Last commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
12
Issues (30d)
26
Star History
341 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.