Discover and explore top open-source AI tools and projects—updated daily.
yl4579Fast, high-quality neural vocoder for speech synthesis
Top 99.8% on SourcePulse
Summary
HiFTNet is a neural vocoder designed for fast, high-quality speech synthesis from mel-spectrograms. It addresses the computational and parameter inefficiencies of prior GAN-based models like HiFi-GAN and BigVGAN. Targeting researchers and developers in speech synthesis, HiFTNet offers a significant speed-up and parameter reduction while achieving state-of-the-art or ground-truth-level audio quality, enabling real-time applications.
How It Works
HiFTNet extends the iSTFTNet architecture by incorporating a novel harmonic-plus-noise source filter operating in the time-frequency domain. This filter leverages a sinusoidal source derived from a fundamental frequency (F0) estimated by a pre-trained network. This design choice allows for rapid inference, significantly reducing computational load and model size compared to traditional GAN vocoders, while maintaining high fidelity.
Quick Start & Requirements
git clone https://github.com/yl4579/HiFTNet.git), navigate into the directory, and install Python requirements (pip install -r requirements.txt).yl4579/PitchExtractor.inference.ipynb. Audio samples can be found at https://hiftnet.github.io/. The research paper is available at https://arxiv.org/abs/2309.09493.Highlighted Details
Maintenance & Community
No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmaps are present in the provided README.
Licensing & Compatibility
The README does not specify a software license. This absence requires clarification for any adoption decision, particularly concerning commercial use or integration into proprietary systems.
Limitations & Caveats
The vocoder's performance is critically dependent on the accuracy of the fundamental frequency (F0) estimation. For optimal results, especially with noisy audio or non-speech content, training a dedicated F0 model is recommended.
1 year ago
Inactive
lucidrains
AIGC-Audio
CorentinJ