hifi-gan by jik876

GAN for high-fidelity speech synthesis

Created 5 years ago

2,320 stars

Top 19.2% on SourcePulse

2 Experts Love This Project

chenlin9

Cofounder of Pika

andreasjansson

Andreas Jansson

Cofounder of Replicate

Project Summary

HiFi-GAN is a Generative Adversarial Network (GAN) designed for efficient and high-fidelity speech synthesis. It targets researchers and developers in speech technology seeking to generate natural-sounding audio waveforms at speeds significantly faster than real-time, outperforming autoregressive and flow-based models in both speed and quality.

How It Works

HiFi-GAN leverages GANs to model periodic patterns in speech audio, a crucial factor for enhancing sample quality. This approach allows for efficient sampling and reduced memory usage compared to other generative models. The architecture is optimized for high-fidelity audio generation, achieving results comparable to human quality.

Quick Start & Requirements

Install via pip install -r requirements.txt.
Requires Python >= 3.6.
Download and prepare the LJ Speech dataset.
Training command: python train.py --config config_v1.json.
Pretrained models are available for various datasets and fine-tuning scenarios.
Demo website: [Not explicitly linked, but mentioned]

Highlighted Details

Generates 22.05 kHz high-fidelity audio 167.9x faster than real-time on a single V100 GPU.
Achieves comparable quality to autoregressive models even on CPU (13.4x faster than real-time).
Offers a universal model for transfer learning to new datasets.
Supports fine-tuning with mel-spectrograms generated by models like Tacotron2.

Maintenance & Community

Implemented based on references from WaveGlow, MelGAN, and Tacotron2.
No specific community links (Discord, Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Given the references to other models, users should verify licensing for commercial use.

Limitations & Caveats

The README does not specify a license, which may impact commercial adoption.
Community support and active maintenance status are not detailed.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

12 stars in the last 30 days

Explore Similar Projects

Starred by

Alexander Borzunov

Alexander Borzunov(Research Scientist at OpenAI).

SpecVQGAN by v-iashin

Research code for visually guided sound generation via codebook sampling

Created 4 years ago

Updated 1 year ago

Starred by

Andreas Jansson

Andreas Jansson(Cofounder of Replicate) and

Piotr Dąbkowski

Piotr Dąbkowski(Cofounder of ElevenLabs).

univnet by maum-ai

PyTorch implementation of UnivNet vocoder for high-fidelity waveform generation

Created 4 years ago

Updated 4 years ago

wavegrad by lmnt-com

Neural vocoder for high-quality waveform generation from spectrograms

Created 5 years ago

Updated 2 years ago

VocGAN by rishikksh20

Real-time vocoder using a hierarchically-nested adversarial network

Created 5 years ago

Updated 1 year ago

Starred by

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral).

FastDiff by Rongjiehuang

PyTorch implementation for fast, high-fidelity speech synthesis via conditional diffusion

Created 4 years ago

Updated 1 year ago

Starred by

Casper Hansen

Casper Hansen(Author of AutoAWQ).

melgan by seungwonpark

PyTorch implementation of MelGAN vocoder

Created 6 years ago

Updated 5 years ago

tacotronv2_wavernn_chinese by lturing

TTS pipeline for Chinese speech synthesis

Created 5 years ago

Updated 2 years ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

AudioLDM2 by haoheliu

CLI tool for text-conditional audio/music generation

Created 2 years ago

Updated 1 year ago

Starred by

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral) and

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

AudioLDM by haoheliu

Audio generation research paper using latent diffusion

Created 3 years ago

Updated 8 months ago

Starred by

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs) and

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

audiolm-pytorch by lucidrains

PyTorch implementation of Google's AudioLM for audio generation

Created 3 years ago

Updated 1 year ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind),

Casper Hansen

Casper Hansen(Author of AutoAWQ), and

1 more.

ParallelWaveGAN by kan-bayashi

Pytorch vocoder for real-time speech synthesis, based on Parallel WaveGAN

Created 6 years ago

Updated 1 year ago

Starred by

Christian Laforte

Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

1 more.

Amphion by open-mmlab

Toolkit for audio, music, and speech generation research

Created 2 years ago

Updated 9 months ago

Feedback? Help us improve.