versa  by wavlab-speech

Audio/speech evaluation toolkit

Created 1 year ago
328 stars

Top 83.2% on SourcePulse

GitHubView on GitHub
Project Summary

VERSA is a comprehensive toolkit for evaluating speech and audio quality, offering over 90 metrics for researchers and developers. It provides a unified framework for assessing audio across multiple dimensions, including perceptual quality, intelligibility, and technical measurements, with a focus on seamless integration and scalability.

How It Works

VERSA employs a flexible architecture that supports various input formats (file paths, SCP files, Kaldi-style ARKs) and integrates tightly with ESPnet. It allows for distributed evaluation using Slurm, enabling scalable processing of audio data. The toolkit aligns with original algorithm developer APIs, avoiding model redistribution and ensuring compatibility with existing workflows.

Quick Start & Requirements

Highlighted Details

  • Supports over 90 evaluation/profiling metrics with 10x variants.
  • Integrated with ESPnet and offers Slurm-based distributed evaluation.
  • Features LLM-informed audio quality profiling, including Qwen2-Audio metrics.
  • Presented at NAACL 2025, showcasing its unified multi-metric evaluation framework.

Maintenance & Community

The project was presented at NAACL 2025 and released v1.0 in Dec 2024. Contributions are welcome via Pull Requests.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

Some metrics require manual installation of dependencies not included in the core package. The README mentions upcoming support for multi-process/multi-GPU local machine execution.

Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
1
Star History
14 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.