Zonos  by Zyphra

Open-weight text-to-speech model for expressive, high-quality speech generation

Created 7 months ago
7,046 stars

Top 7.3% on SourcePulse

GitHubView on GitHub
Project Summary

Zonos-v0.1 is an open-weight text-to-speech model designed for highly natural and expressive speech generation, including zero-shot voice cloning. It targets researchers and developers seeking high-quality, controllable TTS capabilities, offering performance comparable to commercial providers.

How It Works

Zonos utilizes a transformer or hybrid backbone for DAC token prediction, preceded by text normalization and phonemization via eSpeak. This architecture allows for conditioning on speaker embeddings or audio prefixes, enabling fine-grained control over speech rate, pitch, audio quality, and emotions. The model outputs audio natively at 44kHz.

Quick Start & Requirements

  • Install: uv sync (or uv sync --extra compile for hybrid) followed by uv pip install -e . (or .[compile]).
  • Prerequisites: Linux (Ubuntu 22.04/24.04 recommended), macOS. GPU with 6GB+ VRAM (Nvidia 3000-series+ for hybrid). eSpeak-ng library.
  • Resources: CPU-only is possible but slow. Docker installation is available.
  • Demo: playground.zyphra.com/audio

Highlighted Details

  • Zero-shot TTS with voice cloning from short audio samples.
  • Supports multilingual generation (English, Japanese, Chinese, French, German).
  • Fine-grained control over speaking rate, pitch, audio quality, and emotions (happiness, fear, sadness, anger).
  • Real-time factor of ~2x on an RTX 4090.
  • Includes a Gradio WebUI for easy speech generation.

Maintenance & Community

  • No specific contributors, sponsorships, or community links (Discord/Slack, roadmap) are mentioned in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial or closed-source use is not specified.

Limitations & Caveats

  • Experimental Windows support is available via a fork. The hybrid model has specific GPU requirements. The README does not detail any known bugs or deprecations.
Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
4
Star History
148 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.