unified-audio  by alibaba

Unifies audio processing and generation across diverse tasks

Created 4 months ago
279 stars

Top 93.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary QuarkAudio offers a unified framework for diverse audio processing and generation tasks, built upon a decoder-only autoregressive language model (AR-LM). It targets researchers and engineers, promoting reproducible research by consolidating multiple audio functions—from speech enhancement to voice conversion—into a single, prompt-free model. This approach simplifies complex audio pipelines and accelerates innovation.

How It Works The system employs a decoder-only AR-LM backbone for autoregressive speech token prediction, inspired by LLMs. It achieves end-to-end compatibility by integrating with feature extractors like WavLM/Hubert and discrete codecs such as H-Codec. This unified architecture enables a single model to handle numerous audio tasks without explicit instructions, presenting a novel and efficient paradigm for audio AI development.

Quick Start & Requirements Code and pretrained models are accessible via linked repositories (QuarkAudio-HCodec, UniSE) and Hugging Face Spaces. Inference examples and a demo page are available. Specific installation commands, Python versions, or hardware prerequisites (e.g., GPU, CUDA) are not detailed here but can be found in the linked resources.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
96 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.