EfficientWord-Net by Ant-Brain

Hotword detection engine for custom voice assistants

Created 4 years ago

312 stars

Top 86.1% on SourcePulse

Project Summary

EfficientWord-Net is a Python-based hotword detection engine for home assistants and other applications, enabling custom wake-word activation with few-shot learning. It targets developers seeking to integrate custom hotwords without significant overhead, leveraging TFLite for efficient real-time inference.

How It Works

The engine is inspired by FaceNet's Siamese Network architecture, utilizing a Resnet_50_Arc_loss model for robust performance. It trains by comparing user-provided hotword samples against a reference, achieving high accuracy with as few as 3-4 samples. The TFLite implementation ensures fast inference, suitable for real-time applications.

Quick Start & Requirements

Install via pip: pip install EfficientWord-Net
Prerequisites: PyAudio (requires PortAudio), TFLite. Librosa is optional for inference but required for generating reference files. macOS and Raspberry Pi users may need to compile dependencies.
Python versions: 3.6 to 3.9.
Demo: python -m eff_word_net.engine
Documentation: https://ant-brain.github.io/EfficientWord-Net/

Highlighted Details

Supports custom hotword creation using user-provided audio samples.
Offers out-of-the-box embeddings for common hotwords like "Google," "Alexa," and "Siri."
Includes a MultiHotwordDetector for simultaneous detection of multiple hotwords.
The newer Resnet_50_Arc_loss model offers improved noise resilience and requires fewer samples than the older First_Iteration_Siamese model.

Maintenance & Community

The project is an undergrad project seeking community support and contributions.
Discussions are available for feedback and feature requests.
TODOs include adding an audio file handler, removing the Librosa requirement for edge devices, and supporting model fine-tuning.

Licensing & Compatibility

License: Apache License 2.0.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

The current model is trained on single words and may exhibit unexpected behavior with phrases. The audio processing window is limited to 1.5 seconds, making it less effective for longer hotwords. The Resnet_50_Arc_loss model (approx. 88MB) is too large for microcontrollers like Arduino, though pruned versions are planned.

EfficientWord-Net by Ant-Brain

Explore Similar Projects

edgedict by theblackcat102

awesome-keyword-spotting by zycv

JARVIS-AGI by SreejanPersonal

VITA-Audio by VITA-MLLM

dataspeech by huggingface

smart-turn by pipecat-ai

pyctcdecode by kensho-technologies

athena by athena-team

openWakeWord by dscripka

Qwen2.5-Omni by QwenLM

FunASR by modelscope

DeepSpeech by mozilla