UltraEval-Audio by OpenBMB

Unified framework for comprehensive audio foundation model evaluation

Created 1 year ago

305 stars

Top 87.6% on SourcePulse

Project Summary

Summary

UltraEval-Audio is an open-source framework addressing the comprehensive evaluation of large audio foundation models for speech understanding and generation. It targets researchers and engineers, offering a unified, efficient, and reproducible platform that simplifies benchmark management and model replication.

How It Works

This framework aggregates 34 authoritative benchmarks across speech, sound, medicine, and music, supporting 12 task categories and 10 languages. Its core innovation lies in an "Isolated Runtime" mechanism, which automatically manages model-specific dependencies in isolated environments, eliminating conflicts. It facilitates direct replication of popular open-source models with detailed commands and documentation, alongside automated dataset acquisition and integration of official evaluation tools like WER and BLEU.

Quick Start & Requirements

Installation involves cloning the repository (git clone https://github.com/OpenBMB/UltraEval-Audio.git), setting up a Python 3.10 environment (via Conda or uv), and installing the package (pip install -e . or uv pip install -e .). Key dependencies include Python 3.10 and CUDA (implied). Evaluation of certain models requires API keys (e.g., OpenAI, Google). Links to documentation for custom dataset/model integration are provided.

Highlighted Details

Comprehensive coverage: 34 benchmarks, 12 tasks, 10 languages across 4 domains.
Isolated Runtime: Prevents dependency conflicts during model inference.
Direct Model Replication: Supports numerous popular models (e.g., Qwen, Kimi-Audio, GPT-4o, Gemini) with reproducible results.
Integrated Leaderboards: Showcases performance across Audio Understanding, Generation, and Codec tasks.

Maintenance & Community

The project shows active development with frequent updates noted in its changelog through late 2025. Community support is available via a Discord group (https://discord.com/invite/Qrsbft4e). The project authors are listed, but specific maintainer details or corporate sponsorships are not detailed.

Licensing & Compatibility

The repository's license is not specified in the provided README content, which is a critical omission for assessing commercial use or derivative works. The framework is designed for seamless integration with existing evaluation pipelines.

Limitations & Caveats

Reproducing results for specific models may require consulting an FAQ. Evaluation of proprietary models necessitates obtaining and configuring API keys. The arXiv paper date appears to be a placeholder (2026).

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days