NovaSR by ysharma3501

Audio super-resolution model for extreme efficiency

Created 6 months ago

772 stars

Top 44.5% on SourcePulse

Project Summary

NovaSR is an audio upsampling model designed for extreme efficiency, capable of transforming muffled 16kHz audio into clear 48kHz audio. It targets developers and users who require real-time audio enhancement, dataset restoration, or quality improvements for TTS models with minimal computational overhead. The primary benefit is achieving high-fidelity audio upscaling at speeds exceeding 3500x realtime with a model size of approximately 52KB.

How It Works

NovaSR employs a highly optimized architecture, utilizing fewer than 10 convolutional layers (conv1d) combined with snake activation functions, inspired by BigGAN. This minimalist design prioritizes maximum audio quality within an exceptionally small footprint, enabling its remarkable speed and low memory usage.

Quick Start & Requirements

Installation: pip install git+https://github.com/ysharma3501/NovaSR.git
Prerequisites: Python. The model downloads weights from Hugging Face. A CPU-optimized version is available by initializing FastSR(half=False).
Resources: Benchmarks were conducted on an A100 GPU, achieving over 3600x realtime. CPU performance is also noted as significantly faster than other models.
Links:
- Hugging Face Model: [Link not provided in README, but implied]
- Hugging Face Spaces: [Link not provided in README, but implied]

Highlighted Details

Achieves over 3600x realtime processing speed on a single A100 GPU.
Model size is approximately 52 KB, thousands of times smaller than comparable models.
Audio quality is claimed to be on par with models 5,000x larger.
Trained on 100 hours of data (mls_sidon and vctk).

Maintenance & Community

The primary contact is ysharma3501@gmail.com. The project is actively being trained further, with additional benchmarking planned. No specific community channels (like Discord or Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The license type is not specified in the provided README. This omission requires further investigation for commercial use or integration into closed-source projects.

Limitations & Caveats

Comprehensive benchmarks are still pending. The project appears to be under active development, with ongoing training and potential for future improvements or changes. Specific limitations regarding unsupported platforms or known bugs are not detailed.

NovaSR by ysharma3501

Explore Similar Projects

HiFTNet by yl4579

LinaCodec by ysharma3501

LavaSR by ysharma3501

stable-audio-3 by Stability-AI

MOVA by OpenMOSS

HunyuanVideo-Foley by Tencent-Hunyuan

FunMusic by FunAudioLLM

JoyAI-Echo by jd-opensource

AudioLDM2 by haoheliu

MMAudio by hkchengrex

AudioLDM by haoheliu

audiolm-pytorch by lucidrains