athena-signal by athena-team

Speech signal processing library for research/engineering projects

Created 5 years ago

543 stars

Top 58.7% on SourcePulse

Project Summary

Athena-signal is an open-source C library with Python bindings for speech signal processing, targeting researchers and engineers. It provides implementations of core algorithms like Acoustic Echo Cancellation (AEC), Noise Suppression (NS), Direction of Arrival (DOA), and various beamforming techniques (MVDR, GSC), enabling enhanced audio capture and processing in custom projects.

How It Works

The library leverages C for performance-critical signal processing, exposing its functionality to Python for ease of use and integration. Key modules include AEC with multiple cancellation stages, HPF via cascaded IIR filters, DOA using the Capon algorithm (MVDR), and beamformers like MVDR and GSC. Noise estimation employs the MCRA method, and VAD is integrated with double-talk detection. Modules are individually configurable via switches, allowing flexible pipeline construction.

Quick Start & Requirements

Install: Build wheel from source (swig -python athena_signal/dios_signal.i, python setup.py bdist_wheel sdist), then pip install --ignore-installed dist/athena_signal-*.whl.
Prerequisites: Python 3.x, SWIG, NumPy, Setuptools.
Configuration: Requires manual setting of microphone count, reference channels, and microphone coordinates (3D) for beamforming/DOA modules.
Examples: examples/athena_signal_test.py demonstrates usage.

Highlighted Details

Implements Acoustic Echo Cancellation (AEC), Noise Suppression (NS), Automatic Gain Control (AGC), High Pass Filter (HPF), Direction of Arrival (DOA), MVDR, and GSC.
MVDR and GSC support arbitrary microphone array geometries via user-defined mic_coord.
Noise reduction based on "Minima Controlled Recursive Averaging" (MCRA).
Modules are individually switchable for flexible pipeline configuration.

Maintenance & Community

Open to contributions via issues and pull requests.
Contact information for questions and suggestions is provided.
Acknowledges contributions from WebRTC and Speex.

Licensing & Compatibility

The README does not explicitly state a license. This requires clarification for commercial use or closed-source integration.

Limitations & Caveats

The library is primarily implemented in C, requiring SWIG for Python bindings, which can complicate builds.
Microphone array geometry (mic_coord) must be manually specified for advanced modules.
No explicit benchmarks or performance metrics are provided in the README.

Health Check

Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

1

Star History

2 stars in the last 30 days

Explore Similar Projects

LiveWhisper by Nikorasu

Live transcription tool using OpenAI's Whisper

Created 3 years ago

Updated 3 months ago

Starred by

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

pywhispercpp by absadiki

Python bindings for whisper.cpp, enabling local speech transcription

Created 2 years ago

Updated 4 days ago

AIVoiceChat by KoljaB

Voice chat for low-latency AI companion interaction

Created 2 years ago

Updated 4 months ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic).

ollama-voice-mac by apeatling

Offline voice assistant for macOS

Created 1 year ago

Updated 2 months ago

Starred by

Jong Wook Kim

Jong Wook Kim(Research Scientist at OpenAI).

realtime-transcription-fastrtc by sofdog-gh

Real-time transcription tool using local Whisper models

Created 8 months ago

Updated 3 months ago

Starred by

Alexander Borzunov

Alexander Borzunov(Research Scientist at OpenAI).

speech_course by yandexdataschool

Speech processing course materials

Created 4 years ago

Updated 3 months ago

mic_array by respeaker

Mic array utils for audio processing

Created 8 years ago

Updated 7 years ago

whisper_mic by mallorbc

Microphone interface for OpenAI's Whisper speech-to-text model

Created 3 years ago

Updated 1 year ago

Starred by

Jason Liu

Jason Liu(Author of Instructor).

stable-ts by jianfch

SDK for enhanced audio transcription using OpenAI's Whisper

Created 3 years ago

Updated 1 week ago

Starred by

Tim J. Baek

Tim J. Baek(Founder of Open WebUI),

Gabriel Almeida

Gabriel Almeida(Cofounder of Langflow), and

2 more.

whisper-diarization by MahmoudAshraf97

ASR pipeline for speaker diarization

Created 2 years ago

Updated 3 weeks ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and

Travis Fischer

Travis Fischer(Founder of Agentic).

RealtimeSTT by KoljaB

Speech-to-text library for realtime applications

Created 2 years ago

Updated 3 months ago

sherpa-onnx by k2-fsa

Speech toolkit for local, offline speech AI tasks via ONNX

Created 3 years ago

Updated 21 hours ago

Feedback? Help us improve.