leopard by Picovoice

Private, on-device speech-to-text engine

Created 6 years ago

472 stars

Top 64.6% on SourcePulse

Project Summary

Leopard is an on-device, deep learning-powered speech-to-text engine designed for privacy-conscious applications. It offers accurate, compact, and computationally efficient transcription across a wide range of platforms, including desktop, mobile, embedded systems, and web browsers. The engine benefits users by enabling local voice processing, eliminating the need for cloud connectivity for transcription itself, thereby enhancing data privacy and reducing latency.

How It Works

Leopard utilizes a deep learning model optimized for on-device execution, ensuring that all voice processing occurs locally on the user's device. This architecture prioritizes privacy and security by keeping sensitive audio data from leaving the user's environment. Its design emphasizes efficiency, allowing for high accuracy even on resource-constrained platforms like Raspberry Pi, without requiring significant computational overhead.

Quick Start & Requirements

Installation: SDKs are available via package managers (e.g., pip3 install pvleopard for Python, yarn add @picovoice/leopard-node for Node.js, pod 'Leopard-iOS' for iOS). Demos often require specific package installations or build steps.
Prerequisites: A Picovoice AccessKey is mandatory for authentication and authorization, obtained from the Picovoice Console. Internet connectivity is required for initial AccessKey validation. Specific demos may require build tools (CMake, Flutter SDK, React Native environment, Android Studio, Xcode, .NET SDK, Yarn/npm).
Links: Demos are available for various platforms within the repository structure (e.g., demo/python, demo/c, demo/ios). Official documentation and console access are implied via the AccessKey requirement.

Highlighted Details

Extensive Platform Support: Runs on Linux, macOS, Windows, Android, iOS, Raspberry Pi (3, 4, 5), and major web browsers (Chrome, Safari, Firefox, Edge).
Language Support: Includes English, French, German, Italian, Japanese, Korean, Portuguese, and Spanish. Additional languages are available for commercial customers.
Advanced Features: Supports optional automatic punctuation insertion and speaker diarization (added in v2.0.0).
Output Granularity: Provides full transcriptions, word-level timestamps, confidence scores, and speaker tags.

Maintenance & Community

The project has a clear release history with versioning (e.g., v3.0.0, v2.0.0, v1.2.0), indicating active development and feature additions like GPU support and improved accuracy. Specific community links (Discord, Slack) are not detailed in the README.

Licensing & Compatibility

The README does not explicitly state an open-source license. The mandatory AccessKey requirement and mention of "Free Tier usage rights" and "subscription plan" suggest a commercial licensing model, potentially with limitations on usage or distribution for non-commercial purposes. Compatibility for commercial use requires adherence to Picovoice's terms.

Limitations & Caveats

An internet connection is required for AccessKey validation, even though transcription is offline. The specific terms of the commercial license and free tier limits are not detailed within the README and would require consulting Picovoice's official licensing documentation.

leopard by Picovoice

Explore Similar Projects

pindrop by watzon

VoiceFlow by infiniV

Transcribro by soupslurpr

claude-stt by jarrodwatts

murmure by Kieirra

teleprompter by danielgross

voquill by josiahsrc

FluidVoice by altic-dev

cheetah by Picovoice

Scriberr by rishikanthc

VoiceInk by Beingpax

RealtimeSTT by KoljaB