Speech separation research paper implementation
Top 92.3% on sourcepulse
This repository provides official implementations for multichannel speech separation, denoising, and dereverberation, targeting researchers and engineers in audio signal processing. It offers advanced neural network models like SpatialNet and NBC2, enabling high-quality audio enhancement for complex acoustic environments.
How It Works
The project leverages PyTorch Lightning for efficient training and experimentation. It implements advanced neural network architectures, including Conformers and recurrent neural networks, designed to process multichannel audio signals. Key techniques include full-band permutation invariant training (PIT) for robust source separation and extensive learning of spatial information for joint enhancement tasks.
Quick Start & Requirements
pip install -r requirements.txt
gpuRIR
(see https://github.com/DavidDiazGuerra/gpuRIR).python SharedTrainer.py fit --config=configs/SpatialNet.yaml --config=configs/datasets/sms_wsj_plus.yaml --model.channels=[0,1,2,3,4,5] --trainer.precision=bf16-mixed --model.compile=true --trainer.devices=0,
Highlighted Details
bf16-mixed
) and model compilation (torch>=2.0
) for faster training.Maintenance & Community
The project is associated with Westlake University's audio research group. Further information about the group can be found at https://audio.westlake.edu.cn.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Users should verify licensing terms before commercial use or integration into closed-source projects.
Limitations & Caveats
The README mentions that training commands for NB-BLSTM/NBC/NBC2 are in the NBSS branch, implying potential differences in setup or features compared to the main branch. The model.compile=true
feature requires torch>=2.0
.
7 months ago
Inactive