LavaSR by ysharma3501

Ultra-fast speech enhancement and restoration

Created 5 months ago

560 stars

Top 56.6% on SourcePulse

Project Summary

Summary LavaSR is a lightweight, high-performance speech enhancement model that rapidly restores low-quality audio to clean, crisp sound. It addresses critical needs for developers and researchers in enhancing Text-to-Speech (TTS) output, enabling real-time audio processing on resource-constrained devices, and improving the quality of audio datasets. LavaSR offers substantial speedups and quality improvements, surpassing many diffusion-based models.

How It Works The core innovation lies in adapting the Vocos architecture for bandwidth extension (BWE) and incorporating a novel Linkwitz-Riley inspired refiner. This design choice leverages Vocos's inherent isotropic and single-pass processing, yielding significantly faster inference speeds compared to traditional time-domain or diffusion-based methods. The refiner further boosts audio fidelity, effectively upsampling and denoising audio streams with remarkable efficiency.

Quick Start & Requirements

Installation: uv pip install git+https://github.com/ysharma3501/LavaSR.git
Prerequisites: Python, PyTorch (for device selection: 'cpu', 'cuda', 'mps').
Demos & Tools: Integrations include a ComfyUI node and a GUI application. A Hugging Face Spaces demo is also available.

Highlighted Details

Speed: Achieves over 5000x realtime on GPUs and 50x realtime on CPUs, with a model size of approximately 50MB.
Quality: Outperforms diffusion models, demonstrated by competitive Log-Spectral-distance metrics against state-of-the-art methods like AP-BWE.
Efficiency: Requires minimal VRAM usage, around 500MB, making it suitable for edge devices.
Flexibility: Supports universal input sampling rates ranging from 8kHz to 48kHz without requiring specific input formats.

Maintenance & Community The project is under active development, with an Interspeech paper in preparation to detail its technical contributions. Community engagement is facilitated through a dedicated ComfyUI node and a GUI application. Direct inquiries can be sent to yatharthsharma3501@gmail.com.

Licensing & Compatibility LavaSR is released under the Apache-2.0 license. This permissive license allows for broad adoption, including commercial use and integration into closed-source applications without significant restrictions.

Limitations & Caveats Key development milestones, such as the release of training code and models specifically optimized for music and general audio processing, are still pending.

LavaSR by ysharma3501

Explore Similar Projects

HiFTNet by yl4579

LinaCodec by ysharma3501

LongCat-Audio-Codec by meituan-longcat

SEMamba by RoyChao19477

Kitten-TTS-Server by devnen

MiraTTS by ysharma3501

VITA-Audio by VITA-MLLM

NovaSR by ysharma3501

soprano by ekwek1

WavTokenizer by jishengpeng

GPA by AutoArk

LuxTTS by ysharma3501