VAD tool using deep learning & TensorFlow
Top 77.8% on sourcepulse
This project provides a deep learning-based Voice Activity Detection (VAD) system, targeting researchers and developers needing to classify speech from noise in audio signals. It leverages a 1D-Resnet model with MFCC features for high accuracy, offering a robust solution for voice-enabled applications.
How It Works
The system extracts Mel-Frequency Cepstral Coefficients (MFCCs) from audio segments, transforming time-series audio data into a format suitable for deep learning. A 1D-Resnet architecture is then employed to classify these features, distinguishing between speech and noise. This approach is advantageous due to the ResNet's ability to handle sequential data effectively and its proven performance in classification tasks.
Quick Start & Requirements
pip install -r requirements.txt
and pip install -e .
within a Python 3.7.3 virtual environment. Docker is also supported (docker pull filippogrz/tf-vad:latest
).Highlighted Details
Maintenance & Community
No specific community channels or notable contributors are mentioned in the README. The project appears to be a personal or academic endeavor.
Licensing & Compatibility
The README does not explicitly state a license. The project uses TensorFlow 1.15.4, which is compatible with commercial use.
Limitations & Caveats
The project is designed for Ubuntu 20.04 and Python 3.7.3, with a dependency on TensorFlow 1.15.4, which is an older version. Several features are listed under "Todo," including adding online inference and comparing against baseline models, indicating ongoing development.
2 years ago
Inactive