SincNet  by mravanelli

Neural audio processing with SincNet

Created 7 years ago
1,196 stars

Top 32.7% on SourcePulse

GitHubView on GitHub
Project Summary

SincNet addresses the challenge of efficiently processing raw audio samples by introducing a novel Convolutional Neural Network (CNN) architecture. It is designed for applications like speaker identification, offering a more compact and interpretable filter learning process compared to standard CNNs. The primary benefit is the ability to learn meaningful filters by only optimizing low and high cutoff frequencies, resulting in a customized filter bank tailored for specific audio tasks.

How It Works

SincNet utilizes parametrized sinc functions to implement band-pass filters in its first convolutional layer. Unlike traditional CNNs that learn all filter elements, SincNet learns only the cutoff frequencies. This approach significantly reduces the number of learnable parameters, leading to a more efficient and compact model. The architecture then typically employs further convolutional and fully-connected layers for classification.

Quick Start & Requirements

  • Installation: Requires Python 3.6/2.7 and PyTorch 1.0. pysoundfile is also needed (conda install -c conda-forge pysoundfile). Anaconda environment is suggested.
  • Prerequisites: Linux OS.
  • Example: The README provides a detailed walkthrough for speaker identification using the TIMIT database, including data preparation and running the experiment.
  • Resource Estimate: Training on a TITAN X GPU took approximately 24 hours for the TIMIT example.

Highlighted Details

  • Offers a faster SincConv_fast implementation (50% speed improvement).
  • SincNet is also integrated into the SpeechBrain toolkit for broader speech processing applications.
  • The project provides a trained model for TIMIT.

Maintenance & Community

  • The project was last updated around February 2019, with plans for integration into PyTorch-Kaldi. The current repository serves as a showcase.
  • Cite: Ravanelli, Mirco, Yoshua Bengio, “Speaker Recognition from raw waveform with SincNet”.

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project appears to be a showcase with limited updates since early 2019. While it demonstrates SincNet's capabilities, further development or support might be limited. The README mentions that several potential code optimizations are not implemented.

Health Check
Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.