EfficientWord-Net  by Ant-Brain

Hotword detection engine for custom voice assistants

Created 4 years ago
312 stars

Top 86.1% on SourcePulse

GitHubView on GitHub
Project Summary

EfficientWord-Net is a Python-based hotword detection engine for home assistants and other applications, enabling custom wake-word activation with few-shot learning. It targets developers seeking to integrate custom hotwords without significant overhead, leveraging TFLite for efficient real-time inference.

How It Works

The engine is inspired by FaceNet's Siamese Network architecture, utilizing a Resnet_50_Arc_loss model for robust performance. It trains by comparing user-provided hotword samples against a reference, achieving high accuracy with as few as 3-4 samples. The TFLite implementation ensures fast inference, suitable for real-time applications.

Quick Start & Requirements

  • Install via pip: pip install EfficientWord-Net
  • Prerequisites: PyAudio (requires PortAudio), TFLite. Librosa is optional for inference but required for generating reference files. macOS and Raspberry Pi users may need to compile dependencies.
  • Python versions: 3.6 to 3.9.
  • Demo: python -m eff_word_net.engine
  • Documentation: https://ant-brain.github.io/EfficientWord-Net/

Highlighted Details

  • Supports custom hotword creation using user-provided audio samples.
  • Offers out-of-the-box embeddings for common hotwords like "Google," "Alexa," and "Siri."
  • Includes a MultiHotwordDetector for simultaneous detection of multiple hotwords.
  • The newer Resnet_50_Arc_loss model offers improved noise resilience and requires fewer samples than the older First_Iteration_Siamese model.

Maintenance & Community

  • The project is an undergrad project seeking community support and contributions.
  • Discussions are available for feedback and feature requests.
  • TODOs include adding an audio file handler, removing the Librosa requirement for edge devices, and supporting model fine-tuning.

Licensing & Compatibility

  • License: Apache License 2.0.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

The current model is trained on single words and may exhibit unexpected behavior with phrases. The audio processing window is limited to 1.5 seconds, making it less effective for longer hotwords. The Resnet_50_Arc_loss model (approx. 88MB) is too large for microcontrollers like Arduino, though pruned versions are planned.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

pyctcdecode by kensho-technologies

0%
468
CTC beam search decoder for speech recognition
Created 5 years ago
Updated 2 years ago
Feedback? Help us improve.