SoulX-Singer  by Soul-AILab

Zero-shot singing voice synthesis for high-fidelity audio

Created 2 weeks ago

New!

350 stars

Top 79.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

SoulX-Singer offers high-quality, zero-shot singing voice synthesis, enabling realistic voice generation for unseen singers without fine-tuning. It targets researchers and developers, providing flexible melody/rhythm control, timbre cloning across languages, and singing voice editing.

How It Works

This project utilizes a zero-shot singing voice synthesis model trained on over 42,000 hours of aligned vocal data. It supports melody-conditioned (F0) and score-conditioned (MIDI) inputs for precise control. Key advantages include timbre cloning across languages, cross-lingual synthesis by disentangling timbre from content, and singing voice editing while preserving natural prosody.

Quick Start & Requirements

Highlighted Details

  • Zero-Shot Capability: Generates high-fidelity voices for unseen singers without fine-tuning.
  • Flexible Control: Supports melody (F0) and score (MIDI) conditioning for precise pitch and rhythm.
  • Large-Scale Dataset: Trained on over 42,000 hours of aligned vocal data.
  • Timbre Cloning & Cross-Lingual Synthesis: Preserves singer identity and enables synthesis across languages.
  • Singing Voice Editing: Modifies lyrics while maintaining natural prosody.

Maintenance & Community

Recent releases (February 2026) include an evaluation dataset, an online demo, a MIDI editor, and the inference code/models. The roadmap indicates completed items like the WebUI and online demos, with future plans for MIDI-based input support and comprehensive tutorials. Contact is available via email (qianjiale@soulapp.cn, menghao@soulapp.cn, wangxinsheng@soulapp.cn) and WeChat/Soul APP groups for discussions.

Licensing & Compatibility

Licensed under the Apache 2.0 license, permitting free research and developer use. A usage disclaimer emphasizes responsible use, respecting intellectual property and consent, and prohibits impersonation or deceptive audio creation. Developers disclaim liability for misuse.

Limitations & Caveats

The automatic preprocessing pipeline may yield imperfect alignment between singing audio and corresponding lyrics/musical notes, necessitating manual correction via the provided MIDI-Editor for optimal synthesis quality.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
12
Star History
354 stars in the last 19 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

Orpheus-TTS by canopyai

0.3%
6k
Open-source TTS for human-sounding speech, built on Llama-3b
Created 11 months ago
Updated 2 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), and
6 more.

OpenVoice by myshell-ai

0.2%
36k
Audio foundation model for versatile, instant voice cloning
Created 2 years ago
Updated 10 months ago
Feedback? Help us improve.