LavaSR  by ysharma3501

Ultra-fast speech enhancement and restoration

Created 2 months ago
502 stars

Top 61.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary LavaSR is a lightweight, high-performance speech enhancement model that rapidly restores low-quality audio to clean, crisp sound. It addresses critical needs for developers and researchers in enhancing Text-to-Speech (TTS) output, enabling real-time audio processing on resource-constrained devices, and improving the quality of audio datasets. LavaSR offers substantial speedups and quality improvements, surpassing many diffusion-based models.

How It Works The core innovation lies in adapting the Vocos architecture for bandwidth extension (BWE) and incorporating a novel Linkwitz-Riley inspired refiner. This design choice leverages Vocos's inherent isotropic and single-pass processing, yielding significantly faster inference speeds compared to traditional time-domain or diffusion-based methods. The refiner further boosts audio fidelity, effectively upsampling and denoising audio streams with remarkable efficiency.

Quick Start & Requirements

  • Installation: uv pip install git+https://github.com/ysharma3501/LavaSR.git
  • Prerequisites: Python, PyTorch (for device selection: 'cpu', 'cuda', 'mps').
  • Demos & Tools: Integrations include a ComfyUI node and a GUI application. A Hugging Face Spaces demo is also available.

Highlighted Details

  • Speed: Achieves over 5000x realtime on GPUs and 50x realtime on CPUs, with a model size of approximately 50MB.
  • Quality: Outperforms diffusion models, demonstrated by competitive Log-Spectral-distance metrics against state-of-the-art methods like AP-BWE.
  • Efficiency: Requires minimal VRAM usage, around 500MB, making it suitable for edge devices.
  • Flexibility: Supports universal input sampling rates ranging from 8kHz to 48kHz without requiring specific input formats.

Maintenance & Community The project is under active development, with an Interspeech paper in preparation to detail its technical contributions. Community engagement is facilitated through a dedicated ComfyUI node and a GUI application. Direct inquiries can be sent to yatharthsharma3501@gmail.com.

Licensing & Compatibility LavaSR is released under the Apache-2.0 license. This permissive license allows for broad adoption, including commercial use and integration into closed-source applications without significant restrictions.

Limitations & Caveats Key development milestones, such as the release of training code and models specifically optimized for music and general audio processing, are still pending.

Health Check
Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
2
Star History
46 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.