seld-net by sharathadavanne

Sound event localization, detection, and tracking using CRNN

Created 7 years ago

376 stars

Top 75.6% on SourcePulse

Project Summary

This repository provides SELDnet, a convolutional recurrent neural network for sound event localization, detection, and tracking (SELDT) of multiple, potentially overlapping and moving sound sources in 2D spherical space. It is designed for researchers and engineers working on advanced audio signal processing and spatial audio analysis, offering a unified approach to these tasks.

How It Works

SELDnet employs a CRNN architecture that processes sequences of spectrogram frames. It simultaneously performs multi-label classification for sound event detection (SED) and multi-output regression for direction of arrival (DOA) estimation. The SED output is thresholded for binary activity detection, and the corresponding DOA estimates provide the spatial location of active sound events. This dual-output approach allows for joint optimization of temporal activity and spatial localization.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Python version: 3.7.3
Datasets: Download from Zenodo (30-45 GB total).
Feature extraction: python batch_feature_extraction.py
Training: python seld.py
Official quick-start and code: https://github.com/sharathadavanne/seld-net

Highlighted Details

Addresses sound event localization, detection, and tracking (SELDT) in 2D spherical space.
Handles multiple, overlapping, and moving sound sources.
Achieves comparable tracking performance to Bayesian trackers like RBMCDA particle filters.
Includes multiple simulated and real-life datasets for stationary and moving sources.

Maintenance & Community

The project is associated with research challenges at IEEE AASP workshop DCASE. Key contributors are listed in the paper citations.

Licensing & Compatibility

License: TUT License.
Compatibility: Suitable for research and academic use. Commercial use implications depend on the specific terms of the TUT License.

Limitations & Caveats

The provided code is described as a "simple vanilla code without much frills." The larger datasets are available upon request. The project's primary focus is research, and production-readiness is not explicitly stated.

seld-net by sharathadavanne

Explore Similar Projects

Audio-Deepfake-Detection by media-sec-lab

audio-development-tools by Yuan-ManX

awesome-large-audio-models by EmulationAI

LiveWhisper by Nikorasu

vui by fluxions-ai

awesome-audio-visual by krantiparida

ai-audio-datasets by Yuan-ManX

mic_array by respeaker

openvino-plugins-ai-audacity by intel

wer_are_we by syhw

ast by YuanGongND

Kimi-Audio by MoonshotAI