mic_array  by respeaker

Mic array utils for audio processing

created 8 years ago
313 stars

Top 87.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides utilities for the ReSpeaker Microphone Array, enabling Direction of Arrival (DOA) estimation, Voice Activity Detection (VAD), and Keyword Spotting (KWS). It targets developers and researchers working with multi-microphone arrays for audio processing, voice control, and spatial awareness applications. The primary benefit is the integration of these advanced audio features with specific hardware.

How It Works

The project leverages the 8-channel raw audio output from the ReSpeaker hardware. For DOA, it likely employs beamforming or similar spatial audio techniques to pinpoint sound sources. VAD is implemented using the WebRTC VAD library for efficient speech detection. KWS is integrated with the Snowboy engine for wake-word recognition. The scripts demonstrate how to control the device's LED ring and process audio streams for these functionalities.

Quick Start & Requirements

  • Install: sudo pip install pyusb for pixel ring control; pip install webrtcvad for VAD. Snowboy requires sudo apt-get install python-dev libatlas-base-dev swig and manual compilation.
  • Prerequisites: ReSpeaker USB Mic Array with firmware updated for 8-channel raw audio output. For 4-mic arrays, modify scripts. Python 3 is assumed.
  • Setup: Basic setup involves installing Python packages and potentially configuring udev rules for USB access. Snowboy compilation can take several minutes.
  • Links: mic_array_dfu, Google Assistant Library, ODAS, ODAS Studio

Highlighted Details

  • Integrates DOA, VAD, and KWS on ReSpeaker hardware.
  • Includes a script (pixel_ring.py) for controlling the device's LED ring via USB HID.
  • Provides examples for integrating with Google Assistant and the ODAS (Open Acoustic Device) framework for advanced sound source localization.
  • Requires specific firmware flashing for full 8-channel audio support.

Maintenance & Community

The repository is maintained by respeaker. Links to community resources like Discord or Slack are not explicitly provided in the README.

Licensing & Compatibility

The repository itself appears to be under a permissive license, but the integrated Snowboy KWS engine has its own licensing terms which may impact commercial use. Compatibility with closed-source applications would depend on the licensing of Snowboy and any other third-party components.

Limitations & Caveats

The README notes potential issues with SWIG versions during Snowboy compilation, requiring manual Makefile edits. Full functionality, especially 8-channel audio, depends on flashing specific device firmware. The project's reliance on Snowboy, which is no longer actively maintained by its original developers, may pose a long-term risk.

Health Check
Last commit

7 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

ultravox by fixie-ai

0.4%
4k
Multimodal LLM for real-time voice interactions
created 1 year ago
updated 4 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

AudioGPT by AIGC-Audio

0.1%
10k
Audio processing and generation research project
created 2 years ago
updated 1 year ago
Feedback? Help us improve.