Discover and explore top open-source AI tools and projects—updated daily.
FunAudioLLMAdvanced Audio LLM for natural, low-latency voice interactions
New!
Top 51.9% on SourcePulse
Fun-Audio-Chat is a Large Audio Language Model designed for natural, low-latency voice interactions. It addresses the computational demands of audio LLMs by introducing an efficient dual-resolution speech representation, enabling significant compute reduction while preserving high speech quality. This project benefits researchers and developers by offering state-of-the-art performance across various spoken language tasks, including QA, understanding, function calling, and instruction following.
How It Works
The core innovation lies in Dual-Resolution Speech Representations, employing an efficient 5Hz shared backbone combined with a 25Hz refined head. This approach reduces GPU hours by nearly 50% compared to standard 12.5Hz or 25Hz models without sacrificing speech quality. Additionally, Core-Cocktail training is utilized to ensure strong preservation of underlying text LLM capabilities, leading to top-tier results on demanding audio benchmarks.
Quick Start & Requirements
git clone --recurse-submodules), activate a Python 3.12 environment, install PyTorch 2.8.0 (with CUDA 12.8 support), and then pip install -r requirements.txt. ffmpeg is also a prerequisite.huggingface-hub or modelscope.Highlighted Details
Maintenance & Community
The project is developed by the "Tongyi Fun Team". Community interaction is facilitated via GitHub Issues, Pull Requests, and email. An official Dingding chat group is also available for support.
Licensing & Compatibility
Fun-Audio-Chat is licensed under the Apache License (Version 2.0). The project notes that it contains third-party components under other open-source licenses, with details available in the NOTICE file. The Apache 2.0 license is generally permissive for commercial use.
Limitations & Caveats
The provided README does not explicitly detail limitations such as alpha status, known bugs, or unsupported platforms. The release appears to be based on a technical report, suggesting a research-oriented focus.
2 weeks ago
Inactive
janhq
canopyai