Audio-driven visual synthesis for portrait image animation
Top 6.1% on sourcepulse
Hallo addresses the challenge of animating portrait images based on audio input, enabling realistic lip-sync and head movements. It is designed for researchers and developers interested in generative AI, computer vision, and audio-visual synthesis, offering a powerful tool for creating dynamic visual content from static images and speech.
How It Works
Hallo employs a hierarchical approach to audio-driven visual synthesis. It leverages pre-trained models for face analysis, audio separation, and motion generation, integrating Stable Diffusion for visual synthesis. The system processes source images and driving audio, extracting key features like facial landmarks and motion cues to animate the portrait. This hierarchical structure allows for fine-grained control over different aspects of the animation, leading to more natural and expressive results.
Quick Start & Requirements
conda create -n hallo python=3.10
), activate it (conda activate hallo
), install requirements (pip install -r requirements.txt
), and then install the package (pip install .
). FFmpeg is also required (apt-get install ffmpeg
).git clone https://huggingface.co/fudan-generative-ai/hallo pretrained_models
).python scripts/inference.py --source_image <image_path> --driving_audio <audio_path>
.Highlighted Details
Maintenance & Community
The project has seen significant community engagement with various community-developed resources like WebUI, Windows support, and Docker images. A roadmap indicates ongoing work on improving Mandarin Chinese support.
Licensing & Compatibility
The repository does not explicitly state a license. However, it acknowledges contributions from other repositories which may have their own licenses. Users should verify licensing for commercial use.
Limitations & Caveats
The driving audio must be in English due to training data limitations. There is an open bug regarding sound volume affecting inference results (audio normalization). The project is actively developed, with some enhancements still in progress.
10 months ago
1 day