Sound event localization, detection, and tracking using CRNN
Top 78.4% on sourcepulse
This repository provides SELDnet, a convolutional recurrent neural network for sound event localization, detection, and tracking (SELDT) of multiple, potentially overlapping and moving sound sources in 2D spherical space. It is designed for researchers and engineers working on advanced audio signal processing and spatial audio analysis, offering a unified approach to these tasks.
How It Works
SELDnet employs a CRNN architecture that processes sequences of spectrogram frames. It simultaneously performs multi-label classification for sound event detection (SED) and multi-output regression for direction of arrival (DOA) estimation. The SED output is thresholded for binary activity detection, and the corresponding DOA estimates provide the spatial location of active sound events. This dual-output approach allows for joint optimization of temporal activity and spatial localization.
Quick Start & Requirements
pip install -r requirements.txt
python batch_feature_extraction.py
python seld.py
Highlighted Details
Maintenance & Community
The project is associated with research challenges at IEEE AASP workshop DCASE. Key contributors are listed in the paper citations.
Licensing & Compatibility
Limitations & Caveats
The provided code is described as a "simple vanilla code without much frills." The larger datasets are available upon request. The project's primary focus is research, and production-readiness is not explicitly stated.
2 years ago
1 day