NBSS by Audio-WestlakeU

Speech separation research paper implementation

Created 4 years ago

330 stars

Top 83.1% on SourcePulse

Project Summary

This repository provides official implementations for multichannel speech separation, denoising, and dereverberation, targeting researchers and engineers in audio signal processing. It offers advanced neural network models like SpatialNet and NBC2, enabling high-quality audio enhancement for complex acoustic environments.

How It Works

The project leverages PyTorch Lightning for efficient training and experimentation. It implements advanced neural network architectures, including Conformers and recurrent neural networks, designed to process multichannel audio signals. Key techniques include full-band permutation invariant training (PIT) for robust source separation and extensive learning of spatial information for joint enhancement tasks.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
GPU required, with CUDA support recommended.
Dataset generation requires gpuRIR (see https://github.com/DavidDiazGuerra/gpuRIR).
Training uses PyTorch Lightning's CLI. Example training command: python SharedTrainer.py fit --config=configs/SpatialNet.yaml --config=configs/datasets/sms_wsj_plus.yaml --model.channels=[0,1,2,3,4,5] --trainer.precision=bf16-mixed --model.compile=true --trainer.devices=0,
Official audio examples: https://audio.westlake.edu.cn/Research/nbss.htm, https://audio.westlake.edu.cn/Research/SpatialNet.htm

Highlighted Details

Implements multiple state-of-the-art models: NB-BLSTM, NBC, NBC2, SpatialNet, and online SpatialNet.
Supports advanced training features like mixed-precision (bf16-mixed) and model compilation (torch>=2.0) for faster training.
Offers detailed configuration files for network architecture and datasets.
Includes specific commands for dataset generation (e.g., SMS-WSJ-Plus with RIRs).

Maintenance & Community

The project is associated with Westlake University's audio research group. Further information about the group can be found at https://audio.westlake.edu.cn.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing terms before commercial use or integration into closed-source projects.

Limitations & Caveats

The README mentions that training commands for NB-BLSTM/NBC/NBC2 are in the NBSS branch, implying potential differences in setup or features compared to the main branch. The model.compile=true feature requires torch>=2.0.

NBSS by Audio-WestlakeU

Explore Similar Projects

speech-recognition-uk by egorsmkv

Ming-UniAudio by inclusionAI

Meta-voicebox by SpeechifyInc

edgedict by theblackcat102

FastDiff by Rongjiehuang

speech_course by yandexdataschool

Speech-Separation-Paper-Tutorial by JusperLee

HierSpeechpp by sh-lee-prml

speech-denoising-wavenet by drethage

speech-to-text-wavenet by buriburisuri

speechbrain by speechbrain

espnet by espnet