fadtk  by microsoft

Standardized library for audio distance metrics

Created 2 years ago
264 stars

Top 96.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

FADtk provides a standardized Python library for calculating Fréchet Audio Distance (FAD), a metric for evaluating audio generation models, particularly in music. It targets researchers and engineers, offering efficient FAD computation, outlier detection, and the use of pre-computed statistics, streamlining generative audio model assessment.

How It Works

The toolkit computes audio embeddings using a diverse selection of pre-trained models (e.g., CLAP, Encodec, Wav2vec 2.0) and then calculates FAD scores between reference and generated audio datasets. This approach allows for quantitative comparison of audio quality and diversity, with support for both aggregate (FAD∞) and per-sample FAD scores to identify outliers.

Quick Start & Requirements

Installation is straightforward via pip: pip install fadtk. A PyTorch installation is a prerequisite. The library is tested on Python 3.12 and supports versions greater than 3.10 on Linux, Windows, and macOS. Optional dependencies for specific embedding models like CDPAM and DAC can be installed separately. A comprehensive test suite is available via python -m fadtk.test. Further details and a demo are available at https://fadtk.hydev.org/.

Highlighted Details

  • Extensive model support: Integrates numerous embedding models including CLAP, Encodec, VGGish, WavLM, and Whisper.
  • Flexible FAD computation: Supports FAD∞, per-song FAD for outlier analysis, and utilization of pre-computed statistics for rapid evaluation against baselines.
  • Reproducibility: Includes sample code and datasets (MusicCaps, FMA-Pop) from its associated research paper, facilitating reproducible evaluations.
  • Extensibility: Designed for easy addition of new embedding models by inheriting a base class.

Maintenance & Community

The project is associated with authors from Microsoft and academic institutions, as indicated by the paper citation. The README does not specify dedicated community channels (e.g., Discord, Slack) or a public roadmap.

Licensing & Compatibility

FADtk is released under the permissive MIT License. This license permits broad usage, including integration into commercial and closed-source projects without significant restrictions.

Limitations & Caveats

Certain advanced embedding models require separate installation beyond the default pip install fadtk command. The effectiveness of FAD scores is highly dependent on the careful selection of reference datasets and embedding models, as detailed in the project's best practices.

Health Check
Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Dan Abramov Dan Abramov(Core Contributor to React; Coauthor of Redux, Create React App), and
11 more.

jukebox by openai

0%
8k
Generative model for music research paper
Created 6 years ago
Updated 2 years ago
Feedback? Help us improve.