Discover and explore top open-source AI tools and projects—updated daily.
A universal benchmark for evaluating audio large language models
Top 98.2% on SourcePulse
AudioBench is a comprehensive benchmark suite designed to evaluate the performance of Audio Large Language Models (AudioLLMs) across a wide array of tasks. It serves researchers and developers working on models that process and understand audio data, providing a standardized framework for comparison and a live leaderboard for tracking progress. The project aims to accelerate the development of more capable and versatile audio-centric AI systems.
How It Works
AudioBench standardizes the evaluation of AudioLLMs by providing a unified interface to over 50 diverse datasets covering Automatic Speech Recognition (ASR), Speech Translation, Speech Question Answering, Speech Instruction following, and various audio understanding tasks like emotion and accent recognition. It supports multiple evaluation metrics, including traditional ones like WER and BLEU, as well as advanced model-as-judge metrics leveraging LLMs like GPT-4o and Llama 3. This approach allows for a holistic assessment of model capabilities beyond simple accuracy.
Quick Start & Requirements
pip install -r requirements.txt
Highlighted Details
Maintenance & Community
The project is actively maintained, with frequent updates to supported datasets and models. The paper has been accepted to NAACL 2025. Model submissions can be made via email to bwang28c@gmail.com.
Licensing & Compatibility
The repository's license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking would require clarification on the licensing terms.
Limitations & Caveats
Some models, like WavLLM, are noted as no longer supported due to inference complexity. Several advanced models and benchmarks (e.g., ultravox, GLM4-Voice, AIR-Bench) are listed as "To-Do" or not yet supported, indicating ongoing development. The README does not specify the license, which could be a barrier for some users.
3 months ago
1+ week