Discover and explore top open-source AI tools and projects—updated daily.
Soul-AILabRealistic long-form podcast generation from text
Top 28.7% on SourcePulse
SoulX-Podcast is an inference codebase for generating high-fidelity, long-form podcasts from text. It targets users needing realistic multi-turn, multi-speaker dialogic speech synthesis, offering advanced features like cross-dialectal zero-shot voice cloning and paralinguistic controls for enhanced naturalness and personalization.
How It Works
The project focuses on realistic long-form podcast generation, excelling in multi-turn, multi-speaker dialogic speech synthesis. It integrates a range of paralinguistic controls (e.g., laughter, sighs) to enhance realism. A key novelty is its support for cross-dialectal, zero-shot voice cloning, enabling personalized speech generation across various Chinese dialects (Sichuanese, Henanese, Cantonese) and Mandarin/English, using prompt audio samples.
Quick Start & Requirements
git clone git@github.com:Soul-AILab/SoulX-Podcast.git), create a Conda environment with Python 3.11 (conda create -n soulxpodcast -y python=3.11), activate it (conda activate soulxpodcast), and install requirements (pip install -r requirements.txt).huggingface-cli or Python snapshot_download. Git LFS is required for git clone download.bash example/infer_dialogue.sh.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
11 hours ago
Inactive
yl4579
metavoiceio
fishaudio
myshell-ai
2noise
RVC-Boss