Discover and explore top open-source AI tools and projects—updated daily.
alibabaUnifies audio processing and generation across diverse tasks
Top 93.3% on SourcePulse
Summary QuarkAudio offers a unified framework for diverse audio processing and generation tasks, built upon a decoder-only autoregressive language model (AR-LM). It targets researchers and engineers, promoting reproducible research by consolidating multiple audio functions—from speech enhancement to voice conversion—into a single, prompt-free model. This approach simplifies complex audio pipelines and accelerates innovation.
How It Works The system employs a decoder-only AR-LM backbone for autoregressive speech token prediction, inspired by LLMs. It achieves end-to-end compatibility by integrating with feature extractors like WavLM/Hubert and discrete codecs such as H-Codec. This unified architecture enables a single model to handle numerous audio tasks without explicit instructions, presenting a novel and efficient paradigm for audio AI development.
Quick Start & Requirements
Code and pretrained models are accessible via linked repositories (QuarkAudio-HCodec, UniSE) and Hugging Face Spaces. Inference examples and a demo page are available. Specific installation commands, Python versions, or hardware prerequisites (e.g., GPU, CUDA) are not detailed here but can be found in the linked resources.
1 week ago
Inactive
playht
lucidrains