Audio foundation model for universal audio generation
Top 57.2% on sourcepulse
UniAudio is a versatile framework for universal audio generation, designed to handle diverse tasks like Text-to-Speech (TTS), Voice Conversion (VC), and Text-to-Music with a single model. It targets researchers and developers aiming to build flexible audio generation systems.
How It Works
UniAudio employs a unified framework that processes audio and associated metadata as sequences of tokens. This approach allows a single model to learn and generate various audio types by defining task-specific data formats and tokenization strategies. The core advantage lies in its modular design, enabling users to define new tasks, prepare data, tokenize it, and train a universal model.
Quick Start & Requirements
conda create -n uniaudio python=3.8
, conda activate uniaudio
, cd UniAudio
, bash requirements.sh
, bash UniAudio/download.sh
wav.scp
, phone.scp
, utt2spk
files.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is still under active development, with some tasks (e.g., ASR, speaker verification) showing performance gaps compared to state-of-the-art systems. Comprehensive documentation and additional task examples are planned for future releases.
1 year ago
Inactive