SteadyDancer by MCG-NJU

AI framework for harmonized human image animation

Created 3 months ago

582 stars

Top 55.8% on SourcePulse

Project Summary

Summary

SteadyDancer tackles human image animation challenges like spatio-temporal misalignments and identity drift. It provides a robust Image-to-Video framework for high-fidelity, coherent animations, offering superior visual quality and control with reduced training resources compared to prior methods.

How It Works

SteadyDancer uses an Image-to-Video (I2V) paradigm, unlike Reference-to-Video (R2V). This I2V approach inherently ensures first-frame preservation and employs Motion-to-Image Alignment. This design directly addresses spatial-structural inconsistencies and temporal start-gaps, preventing identity drift and artifacts common in R2V methods with real-world data.

Quick Start & Requirements

Installation requires cloning the repo, setting up a Python 3.10 Conda environment, and installing PyTorch 2.5.1 with CUDA 12.1, flash-attention, and xformers. Core dependencies are listed in requirements.txt; manual mmcv compilation from source may be necessary, requiring GCC 5.4+. Pre-trained weights for DW-Pose and SteadyDancer-14B must be downloaded from Hugging Face/ModelScope. Inference involves pose extraction (preprocess/pose_align.py) followed by animation generation (generate_dancer.py), supporting single-GPU and multi-GPU (FSDP + xDiT USP) modes. Key resources include the official paper and the X-Dance benchmark.

Highlighted Details

First-Frame Preservation: Core design for identity consistency.
Image-to-Video Paradigm: Novel approach for robust animation from single image and motion.
X-Dance Benchmark: New benchmark for evaluating performance on spatio-temporal misalignments.
Community Integrations: Support in WanGP and ComfyUI (WanVideoWrapper).
GGUF Weights: Released for lower-cost inference.

Maintenance & Community

The project welcomes community contributions and features integrations with WanGP and ComfyUI. Recent updates include GGUF weights, multi-GPU inference support, and the X-Dance benchmark release.

Licensing & Compatibility

The project is released under the permissive Apache-2.0 license, allowing for broad compatibility with commercial and closed-source applications.

Limitations & Caveats

Installation complexity, particularly for mmcv requiring manual compilation, presents a potential hurdle. Multi-GPU inference may yield non-deterministic results, impacting reproducibility. Early community integrations like ComfyUI might lack full feature parity, affecting performance.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

17 stars in the last 30 days