SDK for managing pretrained ESPnet models, including Hugging Face models
Top 99.2% on sourcepulse
This repository provides utilities for managing and downloading pretrained models for ESPnet, a popular end-to-end speech processing toolkit. It targets researchers and developers working with Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and speech separation, offering a streamlined way to access and utilize state-of-the-art models.
How It Works
The library acts as a centralized hub for ESPnet models, supporting both models hosted on Zenodo and Hugging Face. It allows users to download and unpack models by specifying a Hugging Face ID, a Zenodo URL, or a tag listed in table.csv
. For long audio processing in speech enhancement/separation, it supports segment-wise processing with configurable segment_size
and hop_size
.
Quick Start & Requirements
pip install torch espnet_model_zoo
Highlighted Details
espnet_model_zoo_query
, espnet_model_zoo_download
) for model management.Maintenance & Community
table.csv
.Licensing & Compatibility
Limitations & Caveats
ModelDownloader
API usage is required.2 years ago
1 day