speech_recognition by Uberi

Python SDK for speech recognition, supporting online/offline engines and APIs

Created 11 years ago

8,922 stars

Top 5.8% on SourcePulse

View on GitHub

8 Experts Love This Project

Quincy Larson

Founder of freeCodeCamp

Kevin Hou

Head of Product Engineering at Windsurf

Anton Osika

Cofounder of Lovable

Ben Firshman

Cofounder of Replicate

and 4 more!

Project Summary

This Python library provides a unified interface for speech recognition, supporting numerous online and offline engines and APIs. It's designed for developers and researchers needing flexible speech-to-text capabilities, offering a single API to abstract away the complexities of different services.

How It Works

The library acts as a wrapper, abstracting various speech recognition engines and APIs into a consistent Python interface. It handles audio input (from microphones or files), formats it, and sends it to the chosen backend. The results are then parsed and returned. This approach allows users to easily switch between different recognition services without modifying their core application logic, leveraging the strengths of each backend (e.g., offline processing with CMU Sphinx or Vosk, high accuracy with cloud APIs).

Quick Start & Requirements

Install via pip: pip install SpeechRecognition
Python 3.9+ required.
Optional dependencies for specific backends: PyAudio (microphone input), PocketSphinx (offline), Vosk (offline), Whisper (offline), Google Cloud Speech API, OpenAI Whisper API, Groq Whisper API.
See official documentation for detailed installation and usage.

Highlighted Details

Supports 10+ speech recognition engines and APIs, including offline options like CMU Sphinx, Vosk, and OpenAI Whisper.
Provides utilities for microphone input, audio file transcription, and energy threshold calibration.
Includes example scripts for common use cases like microphone recognition and audio file transcription.
Offers detailed troubleshooting guides for common issues like microphone detection and noise sensitivity.

Maintenance & Community

The project is seeking collaborators as of February 2022. The primary author is Anthony Zhang. Bug reports and suggestions can be made via the issue tracker.

Licensing & Compatibility

The library is released under the 3-clause BSD license. Language files from CMU Sphinx are BSD-licensed. Binaries from FLAC are GPLv2-licensed, but the library's usage of them is considered a "mere aggregation" and does not impose GPL restrictions on the library itself or programs using it.

Limitations & Caveats

The project's maintainer has indicated a need for more time to manage PRs and issues, and is actively seeking collaborators. Some cloud API integrations may require specific authentication setup (e.g., Google Cloud Speech API requires local authentication credentials). The library only supports Google Cloud Speech API v1, not v2.

Health Check

Last Commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

14 stars in the last 30 days