vosk-api by alphacep

Offline speech recognition for 20+ languages

Created 6 years ago

14,926 stars

Top 3.5% on SourcePulse

2 Experts Love This Project

transitive-bullshit

Founder of Agentic

luiscape

Cofounder of Lightning AI

Project Summary

Vosk-api provides an offline, open-source speech recognition toolkit for a wide range of platforms including Android, iOS, Raspberry Pi, and servers. It supports over 20 languages and dialects, offering continuous, large-vocabulary transcription with zero-latency streaming and reconfigurable vocabulary. The toolkit is designed for applications such as chatbots, smart home devices, virtual assistants, and for generating subtitles or transcriptions.

How It Works

Vosk utilizes small, efficient models (around 50MB) that enable continuous, large-vocabulary speech recognition. Its key advantage lies in its zero-latency streaming API, allowing for real-time transcription. The toolkit also supports vocabulary reconfiguration and speaker identification, making it adaptable to various use cases. Bindings are available for multiple programming languages, including Python, Java, Node.js, C#, C++, Rust, and Go.

Quick Start & Requirements

Installation and detailed documentation are available on the Vosk Website.

Highlighted Details

Supports 20+ languages and dialects with ongoing expansion.
Small model size (approx. 50MB) suitable for resource-constrained devices.
Zero-latency streaming API for real-time transcription.
Scalable from embedded devices (Raspberry Pi, Android) to server clusters.

Maintenance & Community

Information regarding contributors, sponsorships, or community channels is not detailed in the provided text.

Licensing & Compatibility

The provided text does not specify the license type or compatibility details.

Limitations & Caveats

The provided text does not mention any specific limitations, caveats, or known issues with the Vosk-api toolkit.

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

7

Star History

108 stars in the last 30 days

Explore Similar Projects

whisper.el by natrys

Emacs Speech-to-Text integration

Created 3 years ago

Updated 3 months ago

Synthalingua by cyberofficial

Real-time translation tool using AI for audio transcription and translation

Created 3 years ago

Updated 1 month ago

ZONOS2 by Zyphra

High-quality, multilingual text-to-speech synthesis

Created 1 month ago

Updated 5 days ago

ollamafreeapi by mfoud444

Free distributed API for accessing open-source LLMs

Created 1 year ago

Updated 2 months ago

tiny-tts by tronghieuit

Ultra-lightweight English Text-to-Speech model

Created 4 months ago

Updated 3 months ago

CrispASR by CrispStrobe

Unified C++ speech engine for ASR and TTS

Created 3 months ago

Updated 14 hours ago

cheetah by Picovoice

On-device streaming speech-to-text engine for private, real-time transcription

Created 7 years ago

Updated 23 hours ago

StreamSpeech by ictnlp

All-in-one model for simultaneous speech tasks (ACL 2024 paper)

Created 2 years ago

Updated 1 year ago

sherpa-ncnn by k2-fsa

Offline STT engine for real-time speech recognition and VAD

Created 3 years ago

Updated 8 months ago

Starred by

Alex Cheema

Alex Cheema(Cofounder of EXO Labs),

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm), and

5 more.

tract by sonos

Tiny, self-contained inference engine for diverse hardware and modalities

Created 9 years ago

Updated 15 hours ago

kokoro-onnx by thewh1teagle

Text-to-speech with ONNX Runtime

Created 1 year ago

Updated 6 days ago

vosk-server by alphacep

Offline speech recognition server

Created 7 years ago

Updated 11 months ago

Feedback? Help us improve.