UniAudio  by yangdongchao

Audio foundation model for universal audio generation

Created 1 year ago
577 stars

Top 56.1% on SourcePulse

GitHubView on GitHub
Project Summary

UniAudio is a versatile framework for universal audio generation, designed to handle diverse tasks like Text-to-Speech (TTS), Voice Conversion (VC), and Text-to-Music with a single model. It targets researchers and developers aiming to build flexible audio generation systems.

How It Works

UniAudio employs a unified framework that processes audio and associated metadata as sequences of tokens. This approach allows a single model to learn and generate various audio types by defining task-specific data formats and tokenization strategies. The core advantage lies in its modular design, enabling users to define new tasks, prepare data, tokenize it, and train a universal model.

Quick Start & Requirements

Highlighted Details

  • Supports 11 audio generation tasks including TTS, VC, singing voice synthesis, speech enhancement, and text-to-sound/music.
  • Offers a pre-trained checkpoint (Open-UniAudio) trained on speech, sound, and music datasets (including DISCO-100k).
  • Provides code for neural audio codec models and a framework for defining custom tasks.
  • Model architecture details: n_layer: 16, n_head: 12, n_embd: 768.

Maintenance & Community

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is still under active development, with some tasks (e.g., ASR, speaker verification) showing performance gaps compared to state-of-the-art systems. Comprehensive documentation and additional task examples are planned for future releases.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

Amphion by open-mmlab

0.2%
9k
Toolkit for audio, music, and speech generation research
Created 1 year ago
Updated 3 months ago
Feedback? Help us improve.