espnet_model_zoo by espnet

SDK for managing pretrained ESPnet models, including Hugging Face models

Created 5 years ago

259 stars

Top 97.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

This repository provides utilities for managing and downloading pretrained models for ESPnet, a popular end-to-end speech processing toolkit. It targets researchers and developers working with Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and speech separation, offering a streamlined way to access and utilize state-of-the-art models.

How It Works

The library acts as a centralized hub for ESPnet models, supporting both models hosted on Zenodo and Hugging Face. It allows users to download and unpack models by specifying a Hugging Face ID, a Zenodo URL, or a tag listed in table.csv. For long audio processing in speech enhancement/separation, it supports segment-wise processing with configurable segment_size and hop_size.

Quick Start & Requirements

Install via pip: pip install torch espnet_model_zoo
Requires Python and PyTorch. Specific model requirements (e.g., sampling rate) should be checked against the model's documentation.
Official documentation: https://github.com/espnet/espnet_model_zoo

Highlighted Details

Supports ASR, TTS, and speech separation tasks.
Integrates with Hugging Face Hub for model discovery and download.
Provides command-line tools (espnet_model_zoo_query, espnet_model_zoo_download) for model management.
Allows specifying model download conditions (task, corpus, version) and direct URL downloads.

Maintenance & Community

Developed by the ESPnet community.
Model registration involves creating Pull Requests to update table.csv.
Uploading models to Hugging Face is supported via a script.

Licensing & Compatibility

The repository itself appears to be under a permissive license, but the underlying models may have different licenses. Users should verify the license of individual models.
Compatibility for commercial use depends on the specific model licenses.

Limitations & Caveats

Users must ensure their audio data's sampling rate matches the pretrained model's requirements, potentially necessitating resampling.
For older ESPnet versions (<=10.1), a different ModelDownloader API usage is required.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days