Audio-driven portrait animation for long durations and high resolutions
Top 13.7% on sourcepulse
Hallo2 is an open-source project for generating long-duration, high-resolution portrait animations driven by audio. It targets researchers and developers in AI-driven media synthesis, offering a solution for creating realistic talking head videos from static images and audio inputs.
How It Works
Hallo2 employs a diffusion-based approach, leveraging a UNet architecture for denoising. It integrates multiple specialized models for face analysis, motion generation, and audio processing. The system processes input images and audio to generate synchronized facial movements and expressions, with an optional super-resolution module for enhanced output quality.
Quick Start & Requirements
conda create -n hallo python=3.10
), activate it, install PyTorch (pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
), install requirements (pip install -r requirements.txt
), and install ffmpeg (apt-get install ffmpeg
).huggingface-cli download fudan-generative-ai/hallo2 --local-dir ./pretrained_models
).python scripts/inference_long.py --config ./configs/inference/long.yaml
for long-duration, python scripts/video_sr.py --input_path [input_video] --output_path [output_dir]
for high-resolution.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
5 months ago
1 day