espnet_model_zoo  by espnet

SDK for managing pretrained ESPnet models, including Hugging Face models

created 5 years ago
255 stars

Top 99.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides utilities for managing and downloading pretrained models for ESPnet, a popular end-to-end speech processing toolkit. It targets researchers and developers working with Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and speech separation, offering a streamlined way to access and utilize state-of-the-art models.

How It Works

The library acts as a centralized hub for ESPnet models, supporting both models hosted on Zenodo and Hugging Face. It allows users to download and unpack models by specifying a Hugging Face ID, a Zenodo URL, or a tag listed in table.csv. For long audio processing in speech enhancement/separation, it supports segment-wise processing with configurable segment_size and hop_size.

Quick Start & Requirements

  • Install via pip: pip install torch espnet_model_zoo
  • Requires Python and PyTorch. Specific model requirements (e.g., sampling rate) should be checked against the model's documentation.
  • Official documentation: https://github.com/espnet/espnet_model_zoo

Highlighted Details

  • Supports ASR, TTS, and speech separation tasks.
  • Integrates with Hugging Face Hub for model discovery and download.
  • Provides command-line tools (espnet_model_zoo_query, espnet_model_zoo_download) for model management.
  • Allows specifying model download conditions (task, corpus, version) and direct URL downloads.

Maintenance & Community

  • Developed by the ESPnet community.
  • Model registration involves creating Pull Requests to update table.csv.
  • Uploading models to Hugging Face is supported via a script.

Licensing & Compatibility

  • The repository itself appears to be under a permissive license, but the underlying models may have different licenses. Users should verify the license of individual models.
  • Compatibility for commercial use depends on the specific model licenses.

Limitations & Caveats

  • Users must ensure their audio data's sampling rate matches the pretrained model's requirements, potentially necessitating resampling.
  • For older ESPnet versions (<=10.1), a different ModelDownloader API usage is required.
Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.