seed-vc  by Plachtaa

CLI tool for zero-shot voice/singing voice conversion, supporting real-time

created 11 months ago
2,786 stars

Top 17.5% on sourcepulse

GitHubView on GitHub
Project Summary

Seed-VC offers zero-shot voice conversion (VC) and singing voice conversion (SVC) with real-time capabilities. It allows users to clone voices from short audio samples (1-30 seconds) without prior training, and supports fine-tuning with minimal data for improved performance. The project targets users needing voice transformation for applications like online meetings, gaming, and live streaming, as well as musicians and content creators.

How It Works

Seed-VC utilizes a U-ViT architecture with skip connections, incorporating OpenAI's Whisper as a speech content encoder and NVIDIA's BigVGAN or HIFT for vocoding. The V2 model introduces ASTRAL-Quantization for speaker-disentangled speech tokenization, enabling better accent and emotion conversion. The approach leverages diffusion models for high-quality audio generation, with configurable parameters for balancing speed, intelligibility, and similarity.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt (Linux/Windows) or pip install -r requirements-mac.txt (Mac M Series). For Windows users, pip install triton-windows==3.2.0.post13 is recommended for V2 model speed-ups.
  • Prerequisites: Python 3.10+, GPU recommended for real-time performance.
  • Resources: Checkpoints are auto-downloaded.
  • Docs: Demo Page, Evaluation

Highlighted Details

  • Supports zero-shot voice conversion, real-time VC, and singing voice conversion.
  • Fine-tuning requires minimal data (1 utterance/speaker) and is fast (2 min on T4).
  • Real-time VC offers ~300ms algorithm delay + ~100ms device delay.
  • V2 model enhances voice and accent conversion, with better source speaker anonymization.

Maintenance & Community

  • Active development with recent updates including V2 model release and Mac M Series support.
  • No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • Real-time GUI on Mac may encounter _tkinter errors, requiring a Python installation with Tkinter support.
  • The README does not mention specific hardware requirements beyond recommending a GPU for real-time performance.
  • No explicit license information is provided, which could impact commercial adoption.
Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
5
Star History
425 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.