leopard  by Picovoice

Private, on-device speech-to-text engine

Created 6 years ago
469 stars

Top 64.8% on SourcePulse

GitHubView on GitHub
Project Summary

Leopard is an on-device, deep learning-powered speech-to-text engine designed for privacy-conscious applications. It offers accurate, compact, and computationally efficient transcription across a wide range of platforms, including desktop, mobile, embedded systems, and web browsers. The engine benefits users by enabling local voice processing, eliminating the need for cloud connectivity for transcription itself, thereby enhancing data privacy and reducing latency.

How It Works

Leopard utilizes a deep learning model optimized for on-device execution, ensuring that all voice processing occurs locally on the user's device. This architecture prioritizes privacy and security by keeping sensitive audio data from leaving the user's environment. Its design emphasizes efficiency, allowing for high accuracy even on resource-constrained platforms like Raspberry Pi, without requiring significant computational overhead.

Quick Start & Requirements

  • Installation: SDKs are available via package managers (e.g., pip3 install pvleopard for Python, yarn add @picovoice/leopard-node for Node.js, pod 'Leopard-iOS' for iOS). Demos often require specific package installations or build steps.
  • Prerequisites: A Picovoice AccessKey is mandatory for authentication and authorization, obtained from the Picovoice Console. Internet connectivity is required for initial AccessKey validation. Specific demos may require build tools (CMake, Flutter SDK, React Native environment, Android Studio, Xcode, .NET SDK, Yarn/npm).
  • Links: Demos are available for various platforms within the repository structure (e.g., demo/python, demo/c, demo/ios). Official documentation and console access are implied via the AccessKey requirement.

Highlighted Details

  • Extensive Platform Support: Runs on Linux, macOS, Windows, Android, iOS, Raspberry Pi (3, 4, 5), and major web browsers (Chrome, Safari, Firefox, Edge).
  • Language Support: Includes English, French, German, Italian, Japanese, Korean, Portuguese, and Spanish. Additional languages are available for commercial customers.
  • Advanced Features: Supports optional automatic punctuation insertion and speaker diarization (added in v2.0.0).
  • Output Granularity: Provides full transcriptions, word-level timestamps, confidence scores, and speaker tags.

Maintenance & Community

The project has a clear release history with versioning (e.g., v3.0.0, v2.0.0, v1.2.0), indicating active development and feature additions like GPU support and improved accuracy. Specific community links (Discord, Slack) are not detailed in the README.

Licensing & Compatibility

The README does not explicitly state an open-source license. The mandatory AccessKey requirement and mention of "Free Tier usage rights" and "subscription plan" suggest a commercial licensing model, potentially with limitations on usage or distribution for non-commercial purposes. Compatibility for commercial use requires adherence to Picovoice's terms.

Limitations & Caveats

An internet connection is required for AccessKey validation, even though transcription is offline. The specific terms of the commercial license and free tier limits are not detailed within the README and would require consulting Picovoice's official licensing documentation.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
14
Issues (30d)
2
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Travis Fischer Travis Fischer(Founder of Agentic).

RealtimeSTT by KoljaB

0.2%
9k
Speech-to-text library for realtime applications
Created 2 years ago
Updated 6 months ago
Feedback? Help us improve.