voice_activity_detection by filippogiruzzi

VAD tool using deep learning & TensorFlow

Created 6 years ago

370 stars

Top 76.4% on SourcePulse

Project Summary

This project provides a deep learning-based Voice Activity Detection (VAD) system, targeting researchers and developers needing to classify speech from noise in audio signals. It leverages a 1D-Resnet model with MFCC features for high accuracy, offering a robust solution for voice-enabled applications.

How It Works

The system extracts Mel-Frequency Cepstral Coefficients (MFCCs) from audio segments, transforming time-series audio data into a format suitable for deep learning. A 1D-Resnet architecture is then employed to classify these features, distinguishing between speech and noise. This approach is advantageous due to the ResNet's ability to handle sequential data effectively and its proven performance in classification tasks.

Quick Start & Requirements

Installation: pip install -r requirements.txt and pip install -e . within a Python 3.7.3 virtual environment. Docker is also supported (docker pull filippogrz/tf-vad:latest).
Prerequisites: Ubuntu 20.04, Python 3.7.3, TensorFlow 1.15.4. Requires the LibriSpeech ASR corpus dataset.
Setup: Virtual environment setup recommended. Docker build may take time.
Links: LibriSpeech Dataset

Highlighted Details

Achieves 97% test accuracy with a 1D-Resnet model.
Includes scripts for dataset labeling, TFRecord conversion, model training, and inference.
Supports GPU acceleration via TensorFlow.

Maintenance & Community

No specific community channels or notable contributors are mentioned in the README. The project appears to be a personal or academic endeavor.

Licensing & Compatibility

The README does not explicitly state a license. The project uses TensorFlow 1.15.4, which is compatible with commercial use.

Limitations & Caveats

The project is designed for Ubuntu 20.04 and Python 3.7.3, with a dependency on TensorFlow 1.15.4, which is an older version. Several features are listed under "Todo," including adding online inference and comparing against baseline models, indicating ongoing development.

voice_activity_detection by filippogiruzzi

Explore Similar Projects

deepspeech-german by AASHISHAG

UniAudio by yangdongchao

awesome-large-audio-models by EmulationAI

dataspeech by huggingface

zamia-speech by gooofy

speech_course by yandexdataschool

dla by markovka17

athena by athena-team

TransformerTTS by spring-media

audiolm-pytorch by lucidrains

Kimi-Audio by MoonshotAI

speech-to-text-wavenet by buriburisuri