UniAudio  by yangdongchao

Audio foundation model for universal audio generation

created 1 year ago
572 stars

Top 57.2% on sourcepulse

GitHubView on GitHub
Project Summary

UniAudio is a versatile framework for universal audio generation, designed to handle diverse tasks like Text-to-Speech (TTS), Voice Conversion (VC), and Text-to-Music with a single model. It targets researchers and developers aiming to build flexible audio generation systems.

How It Works

UniAudio employs a unified framework that processes audio and associated metadata as sequences of tokens. This approach allows a single model to learn and generate various audio types by defining task-specific data formats and tokenization strategies. The core advantage lies in its modular design, enabling users to define new tasks, prepare data, tokenize it, and train a universal model.

Quick Start & Requirements

Highlighted Details

  • Supports 11 audio generation tasks including TTS, VC, singing voice synthesis, speech enhancement, and text-to-sound/music.
  • Offers a pre-trained checkpoint (Open-UniAudio) trained on speech, sound, and music datasets (including DISCO-100k).
  • Provides code for neural audio codec models and a framework for defining custom tasks.
  • Model architecture details: n_layer: 16, n_head: 12, n_embd: 768.

Maintenance & Community

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is still under active development, with some tasks (e.g., ASR, speaker verification) showing performance gaps compared to state-of-the-art systems. Comprehensive documentation and additional task examples are planned for future releases.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

ultravox by fixie-ai

0.4%
4k
Multimodal LLM for real-time voice interactions
created 1 year ago
updated 4 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

AudioGPT by AIGC-Audio

0.1%
10k
Audio processing and generation research project
created 2 years ago
updated 1 year ago
Feedback? Help us improve.