Speech recognition in JavaScript and WebAssembly
Top 28.0% on sourcepulse
This project provides a speech recognition engine that runs entirely within a web browser using JavaScript and WebAssembly, based on the PocketSphinx C library. It's designed for web developers and researchers looking to integrate speech input into web applications without relying on server-side processing, offering offline capabilities and direct microphone access.
How It Works
The core of the project is PocketSphinx, a C-based speech recognition engine, compiled to JavaScript and WebAssembly using Emscripten. This allows it to run efficiently in the browser. An accompanying audioRecorder.js
library, built on the Web Audio API, handles microphone input, sample rate conversion, and data buffering, feeding it to the PocketSphinx engine. For better performance and to avoid blocking the UI thread, the recognizer.js
wrapper utilizes Web Workers to run the speech recognition process in the background.
Quick Start & Requirements
git clone --recursive https://github.com/syl22-00/pocketsphinx.js.git
.webapp/live.html
file using a local web server (e.g., python server.py
) and open it in a browser. Chrome may require --disable-web-security
flag.doc/
directory.Highlighted Details
audioRecorder.js
module for microphone input and processing.recognizer.js
wrapper for efficient use within Web Workers.Maintenance & Community
The project appears to have had its last significant update around 2017. There are no readily available links to active community channels like Discord or Slack mentioned in the README.
Licensing & Compatibility
The core PocketSphinx.js library and associated files are licensed under the MIT License. The audioRecorder.js
and audioRecorderWorker.js
files are based on Recorder.js, also under the MIT License. This permissive licensing allows for commercial use and integration into closed-source applications.
Limitations & Caveats
The project's last commit was in 2017, suggesting potential maintenance gaps and compatibility issues with modern browser APIs or Emscripten versions. Performance and accuracy are highly dependent on acoustic and language models, and initial results may be poor without proper tuning.
5 years ago
1 day