hifi-gan  by jik876

GAN for high-fidelity speech synthesis

Created 5 years ago
2,225 stars

Top 20.4% on SourcePulse

GitHubView on GitHub
Project Summary

HiFi-GAN is a Generative Adversarial Network (GAN) designed for efficient and high-fidelity speech synthesis. It targets researchers and developers in speech technology seeking to generate natural-sounding audio waveforms at speeds significantly faster than real-time, outperforming autoregressive and flow-based models in both speed and quality.

How It Works

HiFi-GAN leverages GANs to model periodic patterns in speech audio, a crucial factor for enhancing sample quality. This approach allows for efficient sampling and reduced memory usage compared to other generative models. The architecture is optimized for high-fidelity audio generation, achieving results comparable to human quality.

Quick Start & Requirements

  • Install via pip install -r requirements.txt.
  • Requires Python >= 3.6.
  • Download and prepare the LJ Speech dataset.
  • Training command: python train.py --config config_v1.json.
  • Pretrained models are available for various datasets and fine-tuning scenarios.
  • Demo website: [Not explicitly linked, but mentioned]

Highlighted Details

  • Generates 22.05 kHz high-fidelity audio 167.9x faster than real-time on a single V100 GPU.
  • Achieves comparable quality to autoregressive models even on CPU (13.4x faster than real-time).
  • Offers a universal model for transfer learning to new datasets.
  • Supports fine-tuning with mel-spectrograms generated by models like Tacotron2.

Maintenance & Community

  • Implemented based on references from WaveGlow, MelGAN, and Tacotron2.
  • No specific community links (Discord, Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. Given the references to other models, users should verify licensing for commercial use.

Limitations & Caveats

  • The README does not specify a license, which may impact commercial adoption.
  • Community support and active maintenance status are not detailed.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
20 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral) and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

AudioLDM by haoheliu

0.1%
3k
Audio generation research paper using latent diffusion
Created 2 years ago
Updated 2 months ago
Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

Amphion by open-mmlab

0.2%
9k
Toolkit for audio, music, and speech generation research
Created 1 year ago
Updated 3 months ago
Feedback? Help us improve.