NBSS  by Audio-WestlakeU

Speech separation research paper implementation

Created 4 years ago
301 stars

Top 88.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides official implementations for multichannel speech separation, denoising, and dereverberation, targeting researchers and engineers in audio signal processing. It offers advanced neural network models like SpatialNet and NBC2, enabling high-quality audio enhancement for complex acoustic environments.

How It Works

The project leverages PyTorch Lightning for efficient training and experimentation. It implements advanced neural network architectures, including Conformers and recurrent neural networks, designed to process multichannel audio signals. Key techniques include full-band permutation invariant training (PIT) for robust source separation and extensive learning of spatial information for joint enhancement tasks.

Quick Start & Requirements

Highlighted Details

  • Implements multiple state-of-the-art models: NB-BLSTM, NBC, NBC2, SpatialNet, and online SpatialNet.
  • Supports advanced training features like mixed-precision (bf16-mixed) and model compilation (torch>=2.0) for faster training.
  • Offers detailed configuration files for network architecture and datasets.
  • Includes specific commands for dataset generation (e.g., SMS-WSJ-Plus with RIRs).

Maintenance & Community

The project is associated with Westlake University's audio research group. Further information about the group can be found at https://audio.westlake.edu.cn.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing terms before commercial use or integration into closed-source projects.

Limitations & Caveats

The README mentions that training commands for NB-BLSTM/NBC/NBC2 are in the NBSS branch, implying potential differences in setup or features compared to the main branch. The model.compile=true feature requires torch>=2.0.

Health Check
Last Commit

8 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Benjamin Bolte Benjamin Bolte(Cofounder of K-Scale Labs), and
3 more.

espnet by espnet

0.2%
9k
End-to-end speech processing toolkit for various speech tasks
Created 7 years ago
Updated 3 days ago
Feedback? Help us improve.