HumanSD by IDEA-Research

Diffusion model for human image generation from skeleton poses

Created 2 years ago

305 stars

Top 87.9% on SourcePulse

Project Summary

HumanSD is an open-source implementation for controllable human image generation guided by skeletal poses. It targets researchers and developers in computer vision and generative AI who need precise control over human figures in generated images, offering superior performance in challenging poses, artistic styles, and multi-person scenarios compared to existing methods like ControlNet.

How It Works

HumanSD fine-tunes the Stable Diffusion model using a novel heatmap-guided denoising loss. This approach directly injects skeletal pose information into the diffusion process, strengthening the pose condition during training without causing catastrophic forgetting. This native integration is more efficient and effective than dual-branch diffusion methods.

Quick Start & Requirements

Install: Clone the repository, install PyTorch (v1.12.1 recommended), requirements.txt, and MMPose (v0.29.0 recommended).
Prerequisites: Python 3.9, PyTorch 1.12.1, CUDA 11.3.
Checkpoints: Download HumanSD checkpoints and Stable Diffusion v2.1 checkpoint.
Demo: Run python scripts/pose2img.py for command-line demo or python scripts/gradio/pose2img.py for Gradio UI. Comparison with ControlNet and T2I-Adapter requires additional setup and checkpoint downloads.
Links: Project Page, Paper, Code, Video, Data

Highlighted Details

Fine-tuned Stable Diffusion with a novel heatmap-guided denoising loss.
Achieves superior results in challenging poses, artistic styles, and multi-person scenarios.
Trained on a custom assembly of three large-scale human-centric datasets.
Offers command-line and Gradio demos for quick evaluation.

Maintenance & Community

The project is associated with ICCV 2023. Key contributors are from International Digital Economy Academy and The Chinese University of Hong Kong. The project acknowledges contributions from LAION, DeepFloyd (Stability AI), and OpenCLIP.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, it is based on Stable Diffusion, which typically uses a permissive license. Compatibility for commercial use or closed-source linking would require explicit license verification.

Limitations & Caveats

The README mentions that some code modifications might be necessary for T2I-Adapter integration due to path conflicts. The dataset preparation, especially for Laion-Human, involves complex file structures and requires careful adherence to instructions.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days