STT  by coqui-ai

Deep learning toolkit for speech-to-text model training and deployment

Created 4 years ago
2,517 stars

Top 18.5% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Coqui STT (🐸STT) is a deep learning toolkit for training and deploying speech-to-text (STT) models, designed for both production and research use. It offers high-quality pre-trained models, an efficient multi-GPU training pipeline, streaming inference, and real-time capabilities with small acoustic model footprints.

How It Works

🐸STT leverages deep learning architectures for its STT models, enabling efficient training and deployment. Its design prioritizes speed and a small model footprint, making it suitable for resource-constrained environments. The toolkit supports multi-GPU training for faster model development and offers streaming inference for real-time applications.

Quick Start & Requirements

Installation and usage details are available in the official documentation: stt.readthedocs.io.

Highlighted Details

  • Battle-tested in production and research environments.
  • Supports multiple transcripts with associated confidence scores.
  • Provides bindings for various programming languages.
  • Offers streaming and real-time inference capabilities.

Maintenance & Community

This project is no longer actively maintained, and the online Model Zoo has been discontinued. The focus has shifted to newer STT models like Whisper and Coqui TTS/Studio. Models remain available in the coqui-ai/STT-models repository. Community support is available via GitHub Discussions and a Gitter Room.

Licensing & Compatibility

The specific license is not explicitly stated in the provided README snippet, but it is an open-source project. Compatibility for commercial use or closed-source linking would require further investigation into the licensing terms of the models and codebase.

Limitations & Caveats

The project is explicitly stated as no longer actively maintained, with the online Model Zoo shut down. This indicates a lack of ongoing development, bug fixes, and feature additions, potentially posing risks for long-term adoption or reliance.

Health Check
Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
21 stars in the last 30 days

Explore Similar Projects

Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

ultravox by fixie-ai

0.2%
4k
Multimodal LLM for real-time voice interactions
Created 1 year ago
Updated 2 weeks ago
Feedback? Help us improve.