ZipVoice  by k2-fsa

Fast zero-shot TTS with flow matching

Created 3 months ago
610 stars

Top 53.8% on SourcePulse

GitHubView on GitHub
Project Summary

ZipVoice offers fast, high-quality zero-shot text-to-speech (TTS) and spoken dialogue generation using flow matching. It targets researchers and developers needing efficient, natural-sounding voice cloning and multi-speaker conversations, supporting both Chinese and English.

How It Works

ZipVoice leverages flow matching, a generative modeling technique, to achieve state-of-the-art TTS performance. This approach allows for efficient, non-autoregressive synthesis, resulting in faster inference speeds compared to traditional autoregressive models. The models are small, with the base version having only 123M parameters, making them suitable for deployment in resource-constrained environments.

Quick Start & Requirements

  • Install via pip install -r requirements.txt after cloning the repository.
  • Optional: Install k2 for training and faster inference, ensuring compatibility with PyTorch and CUDA versions (e.g., pip install k2==1.24.4.dev20250208+cuda12.1.torch2.5.1 -f https://k2-fsa.github.io/k2/cuda.html).
  • Inference commands are provided for single-speaker and dialogue generation.
  • See k2-fsa.org/get-started/k2/ for k2 installation details.

Highlighted Details

  • Supports zero-shot single-speaker TTS and multi-channel spoken dialogue generation.
  • Offers a distilled model (ZipVoice-Distill) for improved speed with minimal performance loss.
  • Includes specialized models for dialogue generation (ZipVoice-Dialog, ZipVoice-Dialog-Stereo).
  • Allows manual correction of Chinese polyphone pronunciations using special tokens.

Maintenance & Community

The project is actively developed by k2-fsa, with recent releases of dialogue models and the OpenDialog dataset. Discussions can be held via GitHub Issues.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The README does not specify any explicit limitations or known bugs. Installation of k2 requires careful attention to PyTorch and CUDA version compatibility.

Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
5
Issues (30d)
32
Star History
138 stars in the last 30 days

Explore Similar Projects

Starred by Tim J. Baek Tim J. Baek(Founder of Open WebUI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
4 more.

StyleTTS2 by yl4579

0.2%
6k
Text-to-speech model achieving human-level synthesis
Created 2 years ago
Updated 1 year ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
2 more.

ChatTTS by 2noise

0.2%
38k
Generative speech model for daily dialogue
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.