openWakeWord  by dscripka

Open-source wakeword detection library for voice-enabled apps

created 3 years ago
1,301 stars

Top 31.3% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an open-source framework for wake word detection, enabling developers to build voice-enabled applications. It offers pre-trained models for common English phrases, focusing on performance and ease of use for real-world applications.

How It Works

The framework utilizes a three-component architecture: an ONNX-based melspectrogram pre-processing function, a shared feature extraction backbone (re-implemented from a Google TFHub module) that generates speech embeddings, and a classification model (e.g., fully-connected or RNN) for wake word detection. This modular design allows for efficient processing and easier modification, with models processing audio in 80ms frames.

Quick Start & Requirements

  • Install via pip: pip install openwakeword
  • Linux requires sudo apt-get install libspeexdsp-dev for optional Speex noise suppression.
  • Supports Python 3.8+.
  • Pre-trained models can be downloaded via openwakeword.utils.download_models().
  • Online demo available on HuggingFace Spaces.

Highlighted Details

  • Claims competitive performance against commercial offerings like Picovoice Porcupine and Mycroft Precise.
  • Models are trained on synthetic data, demonstrating robustness to whispered speech, varied speaking speeds, and phrasing variations.
  • Includes optional Speex noise suppression and Silero VAD integration for improved performance in noisy environments.
  • Offers a simplified training process with Google Colab notebooks, allowing custom model creation in under an hour.

Maintenance & Community

  • Active development with releases noted up to February 2024.
  • Community contributions acknowledged, including a Docker implementation by @dalehumby and a C++ version by @synesthesiam.
  • Links to examples, training notebooks, and community discussions are provided.

Licensing & Compatibility

  • Code is licensed under Apache 2.0.
  • Pre-trained models are licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International, restricting commercial use.

Limitations & Caveats

The project is English-only due to reliance on English TTS models for training data. It is not recommended for highly constrained edge devices or microcontrollers, with alternatives like microWakeWord suggested for such use cases. Commercial use of pre-trained models is restricted by the CC-BY-NC-SA 4.0 license.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
2
Star History
196 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Lianmin Zheng Lianmin Zheng(Author of SGLang).

fish-speech by fishaudio

0.3%
23k
Open-source TTS for multilingual speech synthesis
created 1 year ago
updated 1 week ago
Feedback? Help us improve.