voice_activity_detection  by filippogiruzzi

VAD tool using deep learning & TensorFlow

created 5 years ago
368 stars

Top 77.8% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a deep learning-based Voice Activity Detection (VAD) system, targeting researchers and developers needing to classify speech from noise in audio signals. It leverages a 1D-Resnet model with MFCC features for high accuracy, offering a robust solution for voice-enabled applications.

How It Works

The system extracts Mel-Frequency Cepstral Coefficients (MFCCs) from audio segments, transforming time-series audio data into a format suitable for deep learning. A 1D-Resnet architecture is then employed to classify these features, distinguishing between speech and noise. This approach is advantageous due to the ResNet's ability to handle sequential data effectively and its proven performance in classification tasks.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt and pip install -e . within a Python 3.7.3 virtual environment. Docker is also supported (docker pull filippogrz/tf-vad:latest).
  • Prerequisites: Ubuntu 20.04, Python 3.7.3, TensorFlow 1.15.4. Requires the LibriSpeech ASR corpus dataset.
  • Setup: Virtual environment setup recommended. Docker build may take time.
  • Links: LibriSpeech Dataset

Highlighted Details

  • Achieves 97% test accuracy with a 1D-Resnet model.
  • Includes scripts for dataset labeling, TFRecord conversion, model training, and inference.
  • Supports GPU acceleration via TensorFlow.

Maintenance & Community

No specific community channels or notable contributors are mentioned in the README. The project appears to be a personal or academic endeavor.

Licensing & Compatibility

The README does not explicitly state a license. The project uses TensorFlow 1.15.4, which is compatible with commercial use.

Limitations & Caveats

The project is designed for Ubuntu 20.04 and Python 3.7.3, with a dependency on TensorFlow 1.15.4, which is an older version. Several features are listed under "Todo," including adding online inference and comparing against baseline models, indicating ongoing development.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.