vosk-browser by ccoreilly

Speech recognition for the browser

Created 4 years ago

500 stars

Top 62.2% on SourcePulse

Project Summary

Vosk-Browser is a JavaScript library that enables speech recognition directly within web browsers by leveraging a WebAssembly build of the Vosk speech recognition toolkit. It is designed for web developers who want to integrate real-time speech-to-text capabilities into their applications without relying on server-side processing. The library offers an easy-to-use API for handling microphone input and audio files, supporting multiple languages.

How It Works

This library utilizes a WebAssembly compilation of Vosk, specifically configured to run within a Web Worker. This approach offloads the computationally intensive speech recognition tasks from the main browser thread, preventing UI freezes and ensuring a smooth user experience. The library handles the complexities of Web Worker communication and audio processing, providing a straightforward interface for developers to interact with the Vosk engine.

Quick Start & Requirements

Installation: Install via npm: npm i vosk-browser. Alternatively, use a CDN like jsDelivr.
Prerequisites: None explicitly mentioned beyond a modern web browser.
Usage: Load the library and initialize Vosk with a model (e.g., model.tar.gz). The provided example demonstrates capturing microphone input and processing it for speech recognition.
Demo: A live demo is available at https://ccoreilly.github.io/vosk-browser/.

Highlighted Details

Supports 13 languages for speech recognition.
Designed to run speech recognition within a Web Worker for non-blocking performance.
Includes examples for microphone input and audio file processing.

Maintenance & Community

The project appears to be maintained by ccoreilly.
A "Todos" section in the README indicates planned improvements such as automated publishing and better documentation.

Licensing & Compatibility

The license is not explicitly stated in the provided README.

Limitations & Caveats

The README mentions a "somewhat opinionated" approach, which might imply certain design choices that could limit flexibility for some use cases.
The project is still marked with "Todos" for testing and documentation, suggesting it may not be fully production-ready or stable.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

6 stars in the last 30 days

Explore Similar Projects

whisper.php by CodeWithKyrian

PHP binding for local speech-to-text, leveraging whisper.cpp

Created 1 year ago

Updated 1 month ago

bumblebee by jaxcore

JavaScript framework for local voice applications

Created 5 years ago

Updated 2 years ago

Starred by

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai).

on-device-transcription by Hugo-Dz

Minimal app for on-device speech-to-text conversion

Created 1 year ago

Updated 1 year ago

kokoro-web by eduardolat

Free AI text-to-speech web app, self-hostable with OpenAI API compatibility

Created 11 months ago

Updated 10 months ago

Starred by

Simon Willison

Simon Willison(Coauthor of Django).

prompt-api by webmachinelearning

Web API proposal for prompting browser-provided language models

Created 1 year ago

Updated 1 month ago

openai-realtime-api-nextjs by cameronking4

Next.js starter for OpenAI Realtime API voice apps

Created 1 year ago

Updated 9 months ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

hertz-dev by Standard-Intelligence

Open-source base model for full-duplex conversational audio

Created 1 year ago

Updated 1 year ago

rhasspy by rhasspy

Offline private voice assistant

Created 6 years ago

Updated 8 months ago

pocketsphinx.js by syl22-00

Speech recognition in JavaScript and WebAssembly

Created 12 years ago

Updated 5 years ago

vosk-server by alphacep

Offline speech recognition server

Created 6 years ago

Updated 5 months ago

Starred by

Taranjeet Singh

Taranjeet Singh(Cofounder of Mem0).

speechgpt by hahahumble

Web app for conversing with ChatGPT via speech

Created 2 years ago

Updated 2 years ago

Starred by

Tim J. Baek

Tim J. Baek(Founder of Open WebUI).

WhisperLiveKit by QuentinFuxa

Python package for real-time, local speech-to-text

Created 1 year ago

Updated 2 days ago

Feedback? Help us improve.