hallo2 by fudan-generative-vision

Audio-driven portrait animation for long durations and high resolutions

Created 1 year ago

3,667 stars

Top 13.1% on SourcePulse

Project Summary

Hallo2 is an open-source project for generating long-duration, high-resolution portrait animations driven by audio. It targets researchers and developers in AI-driven media synthesis, offering a solution for creating realistic talking head videos from static images and audio inputs.

How It Works

Hallo2 employs a diffusion-based approach, leveraging a UNet architecture for denoising. It integrates multiple specialized models for face analysis, motion generation, and audio processing. The system processes input images and audio to generate synchronized facial movements and expressions, with an optional super-resolution module for enhanced output quality.

Quick Start & Requirements

Install: Clone the repository, create a conda environment (conda create -n hallo python=3.10), activate it, install PyTorch (pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118), install requirements (pip install -r requirements.txt), and install ffmpeg (apt-get install ffmpeg).
Pretrained Models: Download from HuggingFace (huggingface-cli download fudan-generative-ai/hallo2 --local-dir ./pretrained_models).
System Requirements: Ubuntu 20.04/22.04, CUDA 11.8, tested on A100 GPUs.
Input Data: Square-cropped images (face 50-70% of image, <30° rotation), WAV audio (English, clear vocals).
Inference: python scripts/inference_long.py --config ./configs/inference/long.yaml for long-duration, python scripts/video_sr.py --input_path [input_video] --output_path [output_dir] for high-resolution.
Links: Project page (not provided), Paper (arXiv:2410.07718).

Highlighted Details

Supports long-duration (up to 1 hour) and high-resolution (4K) video generation.
Demonstrates capabilities with speeches from Tailor Swift, Johan Rockstrom, and Churchill.
Includes a roadmap with completed paper submission and code release.
Offers training scripts for both long-duration and high-resolution animation.

Maintenance & Community

Paper accepted to ICLR 2025.
Source code and pretrained weights released October 2024.
Open research positions available at Fudan University.

Licensing & Compatibility

The high-resolution animation feature is under the S-Lab License 1.0. Other components' licenses are not explicitly stated but appear to be permissive, given the open-source release.

Limitations & Caveats

Audio driving is limited to English due to training data constraints.
The project acknowledges social risks related to deepfakes and privacy, emphasizing ethical guidelines.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

8 stars in the last 30 days

Explore Similar Projects

KDTalker by chaolongy

Talking portrait generator research paper

Created 10 months ago

Updated 4 months ago

LIHQ by johnGettings

AI presenter for generating synthetic speaker videos

Created 3 years ago

Updated 2 years ago

ComfyUI_Sonic by smthemex

ComfyUI nodes for audio-driven portrait animation, based on a research paper

Created 11 months ago

Updated 3 months ago

ER-NeRF by Fictionarry

Talking head synthesis via efficient region-aware neural radiance fields

Created 2 years ago

Updated 10 months ago

AniTalker by X-LANCE

Talking face animation via decoupled motion encoding (ACM MM 2024 paper)

Created 1 year ago

Updated 1 year ago

RAD-NeRF by ashawkey

PyTorch for real-time neural talking head synthesis

Created 3 years ago

Updated 1 year ago

hallo3 by fudan-generative-vision

Research paper for portrait image animation via video diffusion transformer

Created 1 year ago

Updated 10 months ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

AudioLDM2 by haoheliu

CLI tool for text-conditional audio/music generation

Created 2 years ago

Updated 1 year ago

Sonic by jixiaozhong

Research paper implementation for audio-driven portrait animation

Created 1 year ago

Updated 3 days ago

Ultralight-Digital-Human by anliyuan

Digital human model for mobile, real-time use

Created 1 year ago

Updated 3 months ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

hallo by fudan-generative-vision

Audio-driven visual synthesis for portrait image animation

Created 1 year ago

Updated 1 year ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

SadTalker by OpenTalker

Talking face animation from a single image and audio

Created 3 years ago

Updated 1 year ago

Feedback? Help us improve.