unified-audio by alibaba

Unifies audio processing and generation across diverse tasks

Created 6 months ago

354 stars

Top 78.7% on SourcePulse

Project Summary

Summary QuarkAudio offers a unified framework for diverse audio processing and generation tasks, built upon a decoder-only autoregressive language model (AR-LM). It targets researchers and engineers, promoting reproducible research by consolidating multiple audio functions—from speech enhancement to voice conversion—into a single, prompt-free model. This approach simplifies complex audio pipelines and accelerates innovation.

How It Works The system employs a decoder-only AR-LM backbone for autoregressive speech token prediction, inspired by LLMs. It achieves end-to-end compatibility by integrating with feature extractors like WavLM/Hubert and discrete codecs such as H-Codec. This unified architecture enables a single model to handle numerous audio tasks without explicit instructions, presenting a novel and efficient paradigm for audio AI development.

Quick Start & Requirements Code and pretrained models are accessible via linked repositories (QuarkAudio-HCodec, UniSE) and Hugging Face Spaces. Inference examples and a demo page are available. Specific installation commands, Python versions, or hardware prerequisites (e.g., GPU, CUDA) are not detailed here but can be found in the linked resources.

unified-audio by alibaba

Explore Similar Projects

LongCat-Audio-Codec by meituan-longcat

speechlib by NavodPeiris

Ming-UniAudio by inclusionAI

SpeechGPT-2.0-preview by OpenMOSS

csm-mlx by senstella

PlayDiffusion by playht

VITA-Audio by VITA-MLLM

ultimate-rvc by JackismyShephard

soundstorm-pytorch by lucidrains

audiolm-pytorch by lucidrains

mini-omni by gpt-omni

Kimi-Audio by MoonshotAI