NovaSR  by ysharma3501

Audio super-resolution model for extreme efficiency

Created 1 month ago
728 stars

Top 47.3% on SourcePulse

GitHubView on GitHub
Project Summary

NovaSR is an audio upsampling model designed for extreme efficiency, capable of transforming muffled 16kHz audio into clear 48kHz audio. It targets developers and users who require real-time audio enhancement, dataset restoration, or quality improvements for TTS models with minimal computational overhead. The primary benefit is achieving high-fidelity audio upscaling at speeds exceeding 3500x realtime with a model size of approximately 52KB.

How It Works

NovaSR employs a highly optimized architecture, utilizing fewer than 10 convolutional layers (conv1d) combined with snake activation functions, inspired by BigGAN. This minimalist design prioritizes maximum audio quality within an exceptionally small footprint, enabling its remarkable speed and low memory usage.

Quick Start & Requirements

  • Installation: pip install git+https://github.com/ysharma3501/NovaSR.git
  • Prerequisites: Python. The model downloads weights from Hugging Face. A CPU-optimized version is available by initializing FastSR(half=False).
  • Resources: Benchmarks were conducted on an A100 GPU, achieving over 3600x realtime. CPU performance is also noted as significantly faster than other models.
  • Links:
    • Hugging Face Model: [Link not provided in README, but implied]
    • Hugging Face Spaces: [Link not provided in README, but implied]

Highlighted Details

  • Achieves over 3600x realtime processing speed on a single A100 GPU.
  • Model size is approximately 52 KB, thousands of times smaller than comparable models.
  • Audio quality is claimed to be on par with models 5,000x larger.
  • Trained on 100 hours of data (mls_sidon and vctk).

Maintenance & Community

The primary contact is ysharma3501@gmail.com. The project is actively being trained further, with additional benchmarking planned. No specific community channels (like Discord or Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The license type is not specified in the provided README. This omission requires further investigation for commercial use or integration into closed-source projects.

Limitations & Caveats

Comprehensive benchmarks are still pending. The project appears to be under active development, with ongoing training and potential for future improvements or changes. Specific limitations regarding unsupported platforms or known bugs are not detailed.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
5
Star History
83 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral) and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

AudioLDM by haoheliu

0.1%
3k
Audio generation research paper using latent diffusion
Created 3 years ago
Updated 8 months ago
Feedback? Help us improve.