SEMamba by RoyChao19477

Mamba-based speech enhancement models

Created 2 years ago

272 stars

Top 94.6% on SourcePulse

Project Summary

SEMamba provides an official implementation for speech enhancement (SE) models based on the Mamba architecture, designed for universal, robust, and generalizable performance. It addresses diverse audio distortions and sampling frequencies with a single model, targeting researchers and engineers in audio signal processing. The project achieved 4th place in the URGENT challenge at IEEE SLT 2024.

How It Works

This project integrates the Mamba architecture into speech enhancement pipelines, aiming to create models capable of handling a wide spectrum of audio degradations, including additive noise, reverberation, clipping, and bandwidth limitations. The core advantage lies in Mamba's sequential modeling capabilities, enabling a unified approach across various sampling rates and distortion types, leading to enhanced robustness and generalization.

Quick Start & Requirements

Installation: Recommended setup involves creating a Conda environment (python=3.9), installing PyTorch 2.2.2, then pip install -r requirements.txt, followed by installing Mamba from source (cd mamba_install && pip install .). Docker environments for x86 and ARM are available.
Prerequisites: Python >= 3.9, CUDA >= 12.0, PyTorch == 2.2.2. Requires GPUs from the RTX series or newer (e.g., A100, RTX 4090, RTX 3090, GH200).
Links: Live Demo: https://huggingface.co/spaces/rc19477/Speech_Enhancement_Mamba.

Highlighted Details

Ranked 4th out of 70 teams in the URGENT challenge (IEEE SLT 2024), presenting at NeurIPS 2024.
Features a live HuggingFace demo for direct audio enhancement.
Offers pre-built Docker images for simplified deployment on x86 and ARM architectures.
Implements Perceptual Contrast Stretching (PCS) as an optional training target or post-processing step.

Maintenance & Community

No explicit community channels (e.g., Discord, Slack), roadmap, or detailed contributor information are provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README, which is a critical omission for assessing commercial use or derivative works.

Limitations & Caveats

Hardware: Limited to RTX series GPUs and newer; older models like GTX 1080 Ti or Tesla V100 may not be supported.
CUDA Issues: Users experiencing CUDA problems are advised to switch to the mamba-2 branch for potential compatibility improvements.
Installation: Careful adherence to installation steps, including installing dependencies from source, is recommended to prevent conflicts.

Health Check

Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

5 stars in the last 30 days

Explore Similar Projects

Starred by

Andreas Jansson

Andreas Jansson(Cofounder of Replicate).

HiFTNet by yl4579

Fast, high-quality neural vocoder for speech synthesis

Created 2 years ago

Updated 1 year ago

WhisperHallu by EtienneAb3d

Audio preprocessing for optimized Whisper transcriptions

Created 3 years ago

Updated 1 year ago

LinaCodec by ysharma3501

Highly compressive audio tokenizer for speech models

Created 6 months ago

Updated 5 months ago

LongCat-Audio-Codec by meituan-longcat

Advanced audio tokenization and detokenization for Speech LLMs

Created 8 months ago

Updated 2 months ago

StyleSpeech by KevinMIN95

Multi-speaker adaptive TTS generation

Created 5 years ago

Updated 4 years ago

LavaSR by ysharma3501

Ultra-fast speech enhancement and restoration

Created 5 months ago

Updated 3 weeks ago

awesome-ai-voice by wildminder

AI audio models for synthesis, generation, and understanding

Created 4 months ago

Updated 2 days ago

GLM-ASR by zai-org

Robust speech recognition model for challenging audio

Created 7 months ago

Updated 4 months ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

HierSpeechpp by sh-lee-prml

PyTorch for zero-shot TTS/voice conversion research

Created 2 years ago

Updated 2 years ago

Starred by

Andreas Jansson

Andreas Jansson(Cofounder of Replicate).

speech-denoising-wavenet by drethage

Neural network for end-to-end speech denoising

Created 9 years ago

Updated 3 years ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI) and

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs).

Kimi-Audio by MoonshotAI

Audio foundation model for understanding, generation, and conversation

Created 1 year ago

Updated 1 year ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI),

Alex Chen

Alex Chen(Cofounder of Nexa AI), and

1 more.

higgs-audio by boson-ai

Expressive text-to-audio generation model

Created 11 months ago

Updated 1 month ago

Feedback? Help us improve.