silero-models by snakers4

Pre-trained STT/TTS/text-enhancement models made simple

Created 5 years ago

5,704 stars

Top 8.8% on SourcePulse

View on GitHub

3 Experts Love This Project

Jeremy Howard

Cofounder of fast.ai

Luis Capelo

Cofounder of Lightning AI

Andrey Vasnetsov

Cofounder of Qdrant

Project Summary

Silero Models provides pre-trained, enterprise-grade Speech-to-Text (STT), Text-to-Speech (TTS), and Text Enhancement (TE) models. It aims to simplify the use of these models, offering quality comparable to or exceeding industry leaders like Google, with minimal dependencies and no compilation required. The project targets developers and researchers needing robust and efficient audio processing capabilities.

How It Works

Silero Models leverage PyTorch for its core functionality, offering models that can be loaded via PyTorch Hub or installed via pip. For broader deployment, ONNX and TensorFlow SavedModel formats are also provided. The models are designed for efficiency, with TTS models capable of faster-than-real-time inference on a single CPU thread. The project emphasizes ease of use, with one-line Python integration for most functionalities.

Quick Start & Requirements

Installation: pip install silero or via torch.hub.load.
Dependencies: PyTorch (1.8+ for STT, 1.10+ for TTS, 1.9+ for TE, 2.0+ for Denoise), torchaudio, omegaconf. ONNX/TensorFlow formats require respective runtimes.
Resources: Models are downloaded on demand. CPU inference is supported and often sufficient due to model speed.
Documentation: Colab examples, Wiki.

Highlighted Details

Supports multiple languages for STT (English, German, Spanish, Ukrainian) and TTS (Russian, Ukrainian, Uzbek, Indic languages, Cyrillic languages).
Offers various model sizes and formats (PyTorch JIT, ONNX, TensorFlow) for STT, with quantization options.
TTS models support SSML for advanced speech synthesis control and offer a wide range of speakers and sample rates.
Includes a text enhancement model for punctuation and capitalization, and denoise models for audio cleanup.

Maintenance & Community

The project is actively maintained by the Silero Team. Community engagement is encouraged via GitHub issues and a contact email for inquiries.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Commercial inquiries are directed to the wiki and a specific licensing page, suggesting potential tiered licensing or commercial use restrictions.

Limitations & Caveats

The README does not specify a license, which could be a blocker for commercial use or integration into closed-source projects. While many models are available in ONNX and TensorFlow, the primary development and examples focus on PyTorch.

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

53 stars in the last 30 days