speech_recognition  by Uberi

Python SDK for speech recognition, supporting online/offline engines and APIs

created 11 years ago
8,813 stars

Top 5.9% on sourcepulse

GitHubView on GitHub
Project Summary

This Python library provides a unified interface for speech recognition, supporting numerous online and offline engines and APIs. It's designed for developers and researchers needing flexible speech-to-text capabilities, offering a single API to abstract away the complexities of different services.

How It Works

The library acts as a wrapper, abstracting various speech recognition engines and APIs into a consistent Python interface. It handles audio input (from microphones or files), formats it, and sends it to the chosen backend. The results are then parsed and returned. This approach allows users to easily switch between different recognition services without modifying their core application logic, leveraging the strengths of each backend (e.g., offline processing with CMU Sphinx or Vosk, high accuracy with cloud APIs).

Quick Start & Requirements

  • Install via pip: pip install SpeechRecognition
  • Python 3.9+ required.
  • Optional dependencies for specific backends: PyAudio (microphone input), PocketSphinx (offline), Vosk (offline), Whisper (offline), Google Cloud Speech API, OpenAI Whisper API, Groq Whisper API.
  • See official documentation for detailed installation and usage.

Highlighted Details

  • Supports 10+ speech recognition engines and APIs, including offline options like CMU Sphinx, Vosk, and OpenAI Whisper.
  • Provides utilities for microphone input, audio file transcription, and energy threshold calibration.
  • Includes example scripts for common use cases like microphone recognition and audio file transcription.
  • Offers detailed troubleshooting guides for common issues like microphone detection and noise sensitivity.

Maintenance & Community

The project is seeking collaborators as of February 2022. The primary author is Anthony Zhang. Bug reports and suggestions can be made via the issue tracker.

Licensing & Compatibility

The library is released under the 3-clause BSD license. Language files from CMU Sphinx are BSD-licensed. Binaries from FLAC are GPLv2-licensed, but the library's usage of them is considered a "mere aggregation" and does not impose GPL restrictions on the library itself or programs using it.

Limitations & Caveats

The project's maintainer has indicated a need for more time to manage PRs and issues, and is actively seeking collaborators. Some cloud API integrations may require specific authentication setup (e.g., Google Cloud Speech API requires local authentication credentials). The library only supports Google Cloud Speech API v1, not v2.

Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
1
Star History
137 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.