UniAudio by yangdongchao

Audio foundation model for universal audio generation

Created 2 years ago

605 stars

Top 54.0% on SourcePulse

Project Summary

UniAudio is a versatile framework for universal audio generation, designed to handle diverse tasks like Text-to-Speech (TTS), Voice Conversion (VC), and Text-to-Music with a single model. It targets researchers and developers aiming to build flexible audio generation systems.

How It Works

UniAudio employs a unified framework that processes audio and associated metadata as sequences of tokens. This approach allows a single model to learn and generate various audio types by defining task-specific data formats and tokenization strategies. The core advantage lies in its modular design, enabling users to define new tasks, prepare data, tokenize it, and train a universal model.

Quick Start & Requirements

Install: conda create -n uniaudio python=3.8, conda activate uniaudio, cd UniAudio, bash requirements.sh, bash UniAudio/download.sh
Prerequisites: Python 3.8, Conda, LibriTTS dataset, wav.scp, phone.scp, utt2spk files.
Demo: http://dongchaoyang.top/UniAudio_demo/
Paper: https://arxiv.org/pdf/2310.00704.pdf

Highlighted Details

Supports 11 audio generation tasks including TTS, VC, singing voice synthesis, speech enhancement, and text-to-sound/music.
Offers a pre-trained checkpoint (Open-UniAudio) trained on speech, sound, and music datasets (including DISCO-100k).
Provides code for neural audio codec models and a framework for defining custom tasks.
Model architecture details: n_layer: 16, n_head: 12, n_embd: 768.

Maintenance & Community

The project is actively developed, with plans to release more checkpoints and expand task support.
Contact: Dongchao (dcyang@se.cuhk.edu.hk) or Jinchuan (jinchuat@andrew.cmu.edu).

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is still under active development, with some tasks (e.g., ASR, speaker verification) showing performance gaps compared to state-of-the-art systems. Comprehensive documentation and additional task examples are planned for future releases.

UniAudio by yangdongchao

Explore Similar Projects

UniAudio2 by yangdongchao

awesome-audio-plaza by metame-ai

SongGen by LiuZH-19

awesome-large-audio-models by EmulationAI

VITA-Audio by VITA-MLLM

tts by inworld-ai

soundstorm-pytorch by lucidrains

WavTokenizer by jishengpeng

SALMONN by bytedance

audiolm-pytorch by lucidrains

Kimi-Audio by MoonshotAI

Amphion by open-mmlab