LIHQ by johnGettings

AI presenter for generating synthetic speaker videos

Created 3 years ago

263 stars

Top 97.0% on SourcePulse

Project Summary

LIHQ is an application designed to generate high-quality AI presenter videos, targeting users who want to create synthetic speakers with custom faces and voices. It simplifies the complex process of AI video generation by integrating multiple open-source deep learning models, making advanced capabilities accessible with minimal setup, particularly within Google Colab environments.

How It Works

LIHQ orchestrates a pipeline of specialized models to achieve its results. It begins with a First Order Motion Model (FOMM) to transfer head and eye movements from a reference video to a user-provided face image. Subsequently, Wav2Lip synchronizes mouth movements with user-provided audio, overlaying this onto the FOMM output. The low-resolution output is then enhanced using GFPGAN for face restoration and upscaling, with an optional second pass for improved quality. Advanced options include frame interpolation for higher FPS and background matting.

Quick Start & Requirements

Install/Run: Primarily designed for Google Colab notebooks.
Prerequisites: Google Colab environment with GPU access. Recommended: StyleGAN2 face image, narrator voice (TorToiSe or Bark).
Setup: Designed for zero setup within Colab.
Links:
- Main Colab: LIHQ
- Examples Colab: LIHQ Examples
- Demo Video: Provided in README.

Highlighted Details

Integrates FOMM, Wav2Lip, GFPGAN, and optionally QVI and MODNet.
Optimized for Google Colab for accessibility and cost-effectiveness.
Supports TorToiSe TTS and suggests Suno AI's Bark as a SOTA alternative.
Prioritizes StyleGAN2 faces and simple narrator voices for best results.

Maintenance & Community

Project initiated and maintained by johnGettings.
No explicit community links (Discord/Slack) or roadmap mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. The underlying models used (StyleGAN2, TorToiSe, Wav2Lip, GFPGAN, Bark) have their own licenses, which may include restrictions on commercial use or redistribution.

Limitations & Caveats

The project is primarily designed for Colab and its local execution is noted as experimental. Achieving optimal results requires trial and error with specific face images and audio, with StyleGAN2 faces and simple narrator voices yielding the best output. Some features like frame interpolation significantly increase inference time.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days