TensorFlow/Keras code for a priori SNR estimation (speech enhancement, robust ASR)
Top 61.8% on sourcepulse
Deep Xi is a TensorFlow 2/Keras framework for estimating a priori Signal-to-Noise Ratio (SNR) for speech enhancement and robust Automatic Speech Recognition (ASR). It targets researchers and engineers in audio processing and speech technology, offering a deep learning approach to improve speech quality and intelligibility.
How It Works
Deep Xi utilizes deep neural networks to predict a mapped version of the a priori SNR from the noisy speech's short-time magnitude spectrum. The mapping uses the cumulative distribution function (CDF) of the instantaneous a priori SNR, computed from training data statistics, to improve convergence. During inference, the estimated a priori SNR is recovered using sample statistics. This approach allows for flexible integration into various speech processing pipelines, including MMSE-based enhancement and mask estimation.
Quick Start & Requirements
pip install -r requirements.txt
after cloning the repository.run.sh
.Highlighted Details
Maintenance & Community
The project is associated with multiple research papers, indicating active development and academic backing. Links to relevant papers and datasets are provided.
Licensing & Compatibility
The repository does not explicitly state a license. However, the inclusion of research papers and datasets suggests a focus on academic use. Commercial use would require clarification of licensing terms.
Limitations & Caveats
The ResLSTM network's performance is noted as not meeting expectations compared to TensorFlow 1.x implementations. The project primarily targets single-channel audio and a 16kHz sampling frequency, though these can be configured.
3 years ago
Inactive