versa by wavlab-speech

Audio/speech evaluation toolkit

Created 1 year ago

392 stars

Top 73.5% on SourcePulse

Project Summary

VERSA is a comprehensive toolkit for evaluating speech and audio quality, offering over 90 metrics for researchers and developers. It provides a unified framework for assessing audio across multiple dimensions, including perceptual quality, intelligibility, and technical measurements, with a focus on seamless integration and scalability.

How It Works

VERSA employs a flexible architecture that supports various input formats (file paths, SCP files, Kaldi-style ARKs) and integrates tightly with ESPnet. It allows for distributed evaluation using Slurm, enabling scalable processing of audio data. The toolkit aligns with original algorithm developer APIs, avoiding model redistribution and ensuring compatibility with existing workflows.

Quick Start & Requirements

Installation: git clone https://github.com/wavlab-speech/versa.git && cd versa && pip install .
Dependencies: Some metrics require additional installations via scripts in the tools directory.
Testing: Core functionality can be tested with python versa/test/test_pipeline/test_general.py.
Demo: An interactive demo is available via Colab: https://colab.research.google.com/github/wavlab-speech/versa/blob/main/demo/interspeech2024_tutorial.ipynb
Documentation: Full metrics documentation is available.

Highlighted Details

Supports over 90 evaluation/profiling metrics with 10x variants.
Integrated with ESPnet and offers Slurm-based distributed evaluation.
Features LLM-informed audio quality profiling, including Qwen2-Audio metrics.
Presented at NAACL 2025, showcasing its unified multi-metric evaluation framework.

Maintenance & Community

The project was presented at NAACL 2025 and released v1.0 in Dec 2024. Contributions are welcome via Pull Requests.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

Some metrics require manual installation of dependencies not included in the core package. The README mentions upcoming support for multi-process/multi-GPU local machine execution.

versa by wavlab-speech

Explore Similar Projects

AudioBench by AudioLLMs

UltraEval-Audio by OpenBMB

llark by spotify-research

stream-translator by fortypercnt

moshi-finetune by kyutai-labs

speech-to-text-benchmark by Picovoice

multi-modal-researcher by langchain-ai

whisper-plus by kadirnar

evalscope by modelscope

Kimi-Audio by MoonshotAI

Kokoro-FastAPI by remsky

wenet by wenet-e2e