SoulX-Singer by Soul-AILab

Zero-shot singing voice synthesis for high-fidelity audio

Created 5 months ago

784 stars

Top 44.0% on SourcePulse

Project Summary

Summary

SoulX-Singer offers high-quality, zero-shot singing voice synthesis, enabling realistic voice generation for unseen singers without fine-tuning. It targets researchers and developers, providing flexible melody/rhythm control, timbre cloning across languages, and singing voice editing.

How It Works

This project utilizes a zero-shot singing voice synthesis model trained on over 42,000 hours of aligned vocal data. It supports melody-conditioned (F0) and score-conditioned (MIDI) inputs for precise control. Key advantages include timbre cloning across languages, cross-lingual synthesis by disentangling timbre from content, and singing voice editing while preserving natural prosody.

Quick Start & Requirements

Install: Clone the repository, set up a Conda environment with Python 3.10 (conda create -n soulxsinger -y python=3.10), activate it, and install dependencies (pip install -r requirements.txt).
Prerequisites: Conda, Python 3.10.
Models: Download SVS and preprocessing models via Hugging Face Hub (hf download Soul-AILab/SoulX-Singer --local-dir pretrained_models/SoulX-Singer and hf download Soul-AILab/SoulX-Singer-Preprocess --local-dir pretrained_models/SoulX-Singer-Preprocess).
Run: Execute the inference demo with bash example/infer.sh or launch the interactive WebUI with python webui.py.
Links: Online Demo & MIDI Editor: https://huggingface.co/spaces/Soul-AILab/SoulX-Singer-Demo, MIDI Editor: https://huggingface.co/spaces/Soul-AILab/SoulX-Singer-MIDI-Editor, Eval Dataset: https://huggingface.co/datasets/Soul-AILab/SoulX-Singer-Eval.

Highlighted Details

Zero-Shot Capability: Generates high-fidelity voices for unseen singers without fine-tuning.
Flexible Control: Supports melody (F0) and score (MIDI) conditioning for precise pitch and rhythm.
Large-Scale Dataset: Trained on over 42,000 hours of aligned vocal data.
Timbre Cloning & Cross-Lingual Synthesis: Preserves singer identity and enables synthesis across languages.
Singing Voice Editing: Modifies lyrics while maintaining natural prosody.

Maintenance & Community

Recent releases (February 2026) include an evaluation dataset, an online demo, a MIDI editor, and the inference code/models. The roadmap indicates completed items like the WebUI and online demos, with future plans for MIDI-based input support and comprehensive tutorials. Contact is available via email (qianjiale@soulapp.cn, menghao@soulapp.cn, wangxinsheng@soulapp.cn) and WeChat/Soul APP groups for discussions.

Licensing & Compatibility

Licensed under the Apache 2.0 license, permitting free research and developer use. A usage disclaimer emphasizes responsible use, respecting intellectual property and consent, and prohibits impersonation or deceptive audio creation. Developers disclaim liability for misuse.

Limitations & Caveats

The automatic preprocessing pipeline may yield imperfect alignment between singing audio and corresponding lyrics/musical notes, necessitating manual correction via the provided MIDI-Editor for optimal synthesis quality.

SoulX-Singer by Soul-AILab

Explore Similar Projects

pits by anonymous-pits

ComfyUI-F5-TTS by niknah

TCSinger by AaronZ345

ComfyUI_IndexTTS by billwuhao

sag by steipete

Orpheus-TTS by canopyai

higgs-audio by boson-ai

VALL-E-X by Plachtaa

whisper-vits-svc by PlayVoice

CosyVoice by FunAudioLLM

OpenVoice by myshell-ai

VibeVoice by microsoft