espnet_model_zoo  by espnet

SDK for managing pretrained ESPnet models, including Hugging Face models

Created 5 years ago
254 stars

Top 99.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides utilities for managing and downloading pretrained models for ESPnet, a popular end-to-end speech processing toolkit. It targets researchers and developers working with Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and speech separation, offering a streamlined way to access and utilize state-of-the-art models.

How It Works

The library acts as a centralized hub for ESPnet models, supporting both models hosted on Zenodo and Hugging Face. It allows users to download and unpack models by specifying a Hugging Face ID, a Zenodo URL, or a tag listed in table.csv. For long audio processing in speech enhancement/separation, it supports segment-wise processing with configurable segment_size and hop_size.

Quick Start & Requirements

  • Install via pip: pip install torch espnet_model_zoo
  • Requires Python and PyTorch. Specific model requirements (e.g., sampling rate) should be checked against the model's documentation.
  • Official documentation: https://github.com/espnet/espnet_model_zoo

Highlighted Details

  • Supports ASR, TTS, and speech separation tasks.
  • Integrates with Hugging Face Hub for model discovery and download.
  • Provides command-line tools (espnet_model_zoo_query, espnet_model_zoo_download) for model management.
  • Allows specifying model download conditions (task, corpus, version) and direct URL downloads.

Maintenance & Community

  • Developed by the ESPnet community.
  • Model registration involves creating Pull Requests to update table.csv.
  • Uploading models to Hugging Face is supported via a script.

Licensing & Compatibility

  • The repository itself appears to be under a permissive license, but the underlying models may have different licenses. Users should verify the license of individual models.
  • Compatibility for commercial use depends on the specific model licenses.

Limitations & Caveats

  • Users must ensure their audio data's sampling rate matches the pretrained model's requirements, potentially necessitating resampling.
  • For older ESPnet versions (<=10.1), a different ModelDownloader API usage is required.
Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Travis Fischer Travis Fischer(Founder of Agentic), and
2 more.

modelscope by modelscope

0.2%
8k
Model-as-a-Service library for model inference, training, and evaluation
Created 3 years ago
Updated 1 day ago
Feedback? Help us improve.