cheetah  by Picovoice

On-device streaming speech-to-text engine for private, real-time transcription

Created 7 years ago
651 stars

Top 51.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Cheetah is an on-device, deep learning-powered streaming speech-to-text engine. It provides developers with a private, accurate, compact, and computationally-efficient solution that runs entirely locally, supporting a wide array of platforms from desktops to embedded systems.

How It Works

This engine processes audio in real-time using deep learning models executed directly on the user's device. Its streaming architecture delivers continuous transcription and endpoint detection locally. Key advantages include robust privacy, as no audio leaves the device, and efficient performance suitable for resource-constrained environments.

Quick Start & Requirements

SDKs are available via standard package managers (pip, npm, yarn, CocoaPods, Gradle, NuGet) for Python, Node.js, React Native, Flutter, iOS, Android, and .NET. Demos often require repository cloning. A mandatory AccessKey from the Picovoice Console is required for authentication and validation (internet connection needed), and model files are necessary for some integrations.

Highlighted Details

  • Privacy-First: All voice processing is strictly on-device.
  • Broad Platform Reach: Supports Linux, macOS, Windows, Android, iOS, Raspberry Pi, and major web browsers.
  • Efficient Performance: Compact and computationally efficient design.
  • Multi-Language Support: English by default; French, German, Italian, Portuguese, Spanish available. Additional languages require commercial licensing.

Maintenance & Community

Active maintenance is evident through regular releases (e.g., v3.0.0, v2.1.0), indicating ongoing development and feature additions like GPU support. Specific community channels are not detailed.

Licensing & Compatibility

The specific open-source license is not explicitly stated in the provided README. Commercial use may necessitate a subscription for higher usage limits or custom language models.

Limitations & Caveats

An AccessKey is mandatory for all deployments, requiring internet for validation despite offline processing. Free tier language support is limited; additional languages require commercial licensing.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
17
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Travis Fischer Travis Fischer(Founder of Agentic).

RealtimeSTT by KoljaB

0.2%
9k
Speech-to-text library for realtime applications
Created 2 years ago
Updated 6 months ago
Feedback? Help us improve.