Audio-language model for audio understanding and chat
Top 25.0% on sourcepulse
Qwen-Audio is a foundational multimodal large language model for universal audio understanding, capable of processing diverse audio types (speech, sound, music) and text to generate text outputs. It targets researchers and developers seeking a versatile audio processing solution, offering state-of-the-art performance across multiple benchmarks without task-specific fine-tuning.
How It Works
Qwen-Audio is built upon the Qwen-7B LLM and Whisper-large-v2 audio encoder. It employs a multi-task learning framework to handle variations in textual labels across datasets, enabling knowledge sharing and improved performance on tasks like speech recognition, audio captioning, and acoustic scene classification. Qwen-Audio-Chat is a fine-tuned version for conversational AI, supporting multi-turn dialogues and audio-oriented interactions.
Quick Start & Requirements
pip install -r requirements.txt
Highlighted Details
Maintenance & Community
qianwen_opensource@alibabacloud.com
for research/product teams.Licensing & Compatibility
Limitations & Caveats
1 year ago
1 week