AudioBench by AudioLLMs

A universal benchmark for evaluating audio large language models

Created 1 year ago

291 stars

Top 90.7% on SourcePulse

Project Summary

AudioBench is a comprehensive benchmark suite designed to evaluate the performance of Audio Large Language Models (AudioLLMs) across a wide array of tasks. It serves researchers and developers working on models that process and understand audio data, providing a standardized framework for comparison and a live leaderboard for tracking progress. The project aims to accelerate the development of more capable and versatile audio-centric AI systems.

How It Works

AudioBench standardizes the evaluation of AudioLLMs by providing a unified interface to over 50 diverse datasets covering Automatic Speech Recognition (ASR), Speech Translation, Speech Question Answering, Speech Instruction following, and various audio understanding tasks like emotion and accent recognition. It supports multiple evaluation metrics, including traditional ones like WER and BLEU, as well as advanced model-as-judge metrics leveraging LLMs like GPT-4o and Llama 3. This approach allows for a holistic assessment of model capabilities beyond simple accuracy.

Quick Start & Requirements

Installation: pip install -r requirements.txt
Prerequisites: For model-as-judge evaluations, a vLLM server with a 1x 80GB GPU is required to host the judging model (e.g., Llama-3-70B-Instruct). A second GPU is needed for running inference on the models being evaluated.
Resources: Setup for model-as-judge evaluation requires significant GPU resources.
Links: Huggingface Space Leaderboard, Huggingface Datasets, AudioLLM Paper Collection

Highlighted Details

Supports over 50 datasets, including recent additions like MMAU and SEAME for code-switching evaluation.
Includes support for multiple languages and accents, with recent additions for Thai, Vietnamese, and Indonesian ASR.
Features a live leaderboard on Huggingface Spaces for tracking model performance.
Accommodates custom dataset loaders and new model integrations.

Maintenance & Community

The project is actively maintained, with frequent updates to supported datasets and models. The paper has been accepted to NAACL 2025. Model submissions can be made via email to bwang28c@gmail.com.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking would require clarification on the licensing terms.

Limitations & Caveats

Some models, like WavLLM, are noted as no longer supported due to inference complexity. Several advanced models and benchmarks (e.g., ultravox, GLM4-Voice, AIR-Bench) are listed as "To-Do" or not yet supported, indicating ongoing development. The README does not specify the license, which could be a barrier for some users.

AudioBench by AudioLLMs

Explore Similar Projects

VoiceBench by MatthewCYM

dasheng-lm by xiaomi-research

ltu by YuanGongND

Awesome-Audio-LLM by AudioLLMs

moshi-finetune by kyutai-labs

awesome-large-audio-models by EmulationAI

huggingsound by jonatasgrosman

Leaderboard by SpeechColab

audio-flamingo by NVIDIA

chinese_speech_pretrain by TencentGameMate

SoundMind by xid32

Kimi-Audio by MoonshotAI