hallo3 by fudan-generative-vision

Research paper for portrait image animation via video diffusion transformer

Created 1 year ago

1,357 stars

Top 29.5% on SourcePulse

Project Summary

Hallo3 enables highly dynamic and realistic portrait image animation driven by audio, targeting researchers and developers in generative AI and computer vision. It leverages a Video Diffusion Transformer architecture to achieve state-of-the-art results in animating static portraits based on speech.

How It Works

Hallo3 utilizes a Video Diffusion Transformer (VDT) model, building upon the CogVideo-5B I2V architecture. This approach allows for the generation of high-fidelity, temporally coherent video sequences from a single image and an audio input. The VDT's transformer backbone is adept at capturing long-range dependencies in video, crucial for realistic motion and expression synthesis, while the diffusion process ensures high-quality visual output.

Quick Start & Requirements

Install: Clone the repository, create a conda environment (conda create -n hallo python=3.10), activate it, and install requirements (pip install -r requirements.txt). Also requires ffmpeg (apt-get install ffmpeg).
Pretrained Models: Download from HuggingFace (huggingface-cli download fudan-generative-ai/hallo3 --local-dir ./pretrained_models). Requires models for audio separation, text encoding, face analysis, and the core VDT.
Inference Data: Reference image (1:1 or 3:2 aspect ratio), WAV audio (English, clear vocals).
Demo: Gradio UI via python hallo3/app.py.
Docs: Project page linked in README.

Highlighted Details

Accepted to CVPR 2025.
Released over 70 hours of talking-head videos and 50 hours of dynamic clips for training data.
Fine-tuned derivative of CogVideo-5B I2V model.
Supports batch inference via provided scripts.

Maintenance & Community

Developed by Fudan University and Baidu Inc.
No explicit community links (Discord/Slack) or roadmap provided in the README.

Licensing & Compatibility

The project is a derivative work of CogVideo-5B, which is open-source. The use, distribution, and modification of Hallo3 must comply with the CogVideo-5B LICENSE. Specific terms of the CogVideo-5B license are not detailed in this README.

Limitations & Caveats

Audio input is restricted to English due to training data limitations.
Potential social risks related to deepfakes and misuse are acknowledged, with a call for ethical guidelines and responsible use.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

15 stars in the last 30 days

Explore Similar Projects

KDTalker by chaolongy

Talking portrait generator research paper

Created 10 months ago

Updated 4 months ago

LIHQ by johnGettings

AI presenter for generating synthetic speaker videos

Created 3 years ago

Updated 2 years ago

memo by memoavatar

Talking video generation research paper

Created 1 year ago

Updated 5 months ago

ER-NeRF by Fictionarry

Talking head synthesis via efficient region-aware neural radiance fields

Created 2 years ago

Updated 10 months ago

AniTalker by X-LANCE

Talking face animation via decoupled motion encoding (ACM MM 2024 paper)

Created 1 year ago

Updated 1 year ago

RAD-NeRF by ashawkey

PyTorch for real-time neural talking head synthesis

Created 3 years ago

Updated 1 year ago

SyncTalk by ZiqiaoPeng

Talking head synthesis research paper (CVPR 2024)

Created 2 years ago

Updated 3 months ago

Sonic by jixiaozhong

Research paper implementation for audio-driven portrait animation

Created 1 year ago

Updated 3 days ago

Ultralight-Digital-Human by anliyuan

Digital human model for mobile, real-time use

Created 1 year ago

Updated 3 months ago

hallo2 by fudan-generative-vision

Audio-driven portrait animation for long durations and high resolutions

Created 1 year ago

Updated 10 months ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

hallo by fudan-generative-vision

Audio-driven visual synthesis for portrait image animation

Created 1 year ago

Updated 1 year ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

SadTalker by OpenTalker

Talking face animation from a single image and audio

Created 3 years ago

Updated 1 year ago

Feedback? Help us improve.