seld-net  by sharathadavanne

Sound event localization, detection, and tracking using CRNN

created 7 years ago
364 stars

Top 78.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides SELDnet, a convolutional recurrent neural network for sound event localization, detection, and tracking (SELDT) of multiple, potentially overlapping and moving sound sources in 2D spherical space. It is designed for researchers and engineers working on advanced audio signal processing and spatial audio analysis, offering a unified approach to these tasks.

How It Works

SELDnet employs a CRNN architecture that processes sequences of spectrogram frames. It simultaneously performs multi-label classification for sound event detection (SED) and multi-output regression for direction of arrival (DOA) estimation. The SED output is thresholded for binary activity detection, and the corresponding DOA estimates provide the spatial location of active sound events. This dual-output approach allows for joint optimization of temporal activity and spatial localization.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Python version: 3.7.3
  • Datasets: Download from Zenodo (30-45 GB total).
  • Feature extraction: python batch_feature_extraction.py
  • Training: python seld.py
  • Official quick-start and code: https://github.com/sharathadavanne/seld-net

Highlighted Details

  • Addresses sound event localization, detection, and tracking (SELDT) in 2D spherical space.
  • Handles multiple, overlapping, and moving sound sources.
  • Achieves comparable tracking performance to Bayesian trackers like RBMCDA particle filters.
  • Includes multiple simulated and real-life datasets for stationary and moving sources.

Maintenance & Community

The project is associated with research challenges at IEEE AASP workshop DCASE. Key contributors are listed in the paper citations.

Licensing & Compatibility

  • License: TUT License.
  • Compatibility: Suitable for research and academic use. Commercial use implications depend on the specific terms of the TUT License.

Limitations & Caveats

The provided code is described as a "simple vanilla code without much frills." The larger datasets are available upon request. The project's primary focus is research, and production-readiness is not explicitly stated.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind) and Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers).

voice_datasets by jim-schwoebel

0.1%
2k
Voice dataset list for voice/sound computing
created 6 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

AudioGPT by AIGC-Audio

0.1%
10k
Audio processing and generation research project
created 2 years ago
updated 1 year ago
Feedback? Help us improve.