Diffusion model for human image generation from skeleton poses
Top 90.4% on sourcepulse
HumanSD is an open-source implementation for controllable human image generation guided by skeletal poses. It targets researchers and developers in computer vision and generative AI who need precise control over human figures in generated images, offering superior performance in challenging poses, artistic styles, and multi-person scenarios compared to existing methods like ControlNet.
How It Works
HumanSD fine-tunes the Stable Diffusion model using a novel heatmap-guided denoising loss. This approach directly injects skeletal pose information into the diffusion process, strengthening the pose condition during training without causing catastrophic forgetting. This native integration is more efficient and effective than dual-branch diffusion methods.
Quick Start & Requirements
requirements.txt
, and MMPose (v0.29.0 recommended).python scripts/pose2img.py
for command-line demo or python scripts/gradio/pose2img.py
for Gradio UI. Comparison with ControlNet and T2I-Adapter requires additional setup and checkpoint downloads.Highlighted Details
Maintenance & Community
The project is associated with ICCV 2023. Key contributors are from International Digital Economy Academy and The Chinese University of Hong Kong. The project acknowledges contributions from LAION, DeepFloyd (Stability AI), and OpenCLIP.
Licensing & Compatibility
The repository does not explicitly state a license in the README. However, it is based on Stable Diffusion, which typically uses a permissive license. Compatibility for commercial use or closed-source linking would require explicit license verification.
Limitations & Caveats
The README mentions that some code modifications might be necessary for T2I-Adapter integration due to path conflicts. The dataset preparation, especially for Laion-Human, involves complex file structures and requires careful adherence to instructions.
1 year ago
1 day