HumanSD  by IDEA-Research

Diffusion model for human image generation from skeleton poses

created 2 years ago
297 stars

Top 90.4% on sourcepulse

GitHubView on GitHub
Project Summary

HumanSD is an open-source implementation for controllable human image generation guided by skeletal poses. It targets researchers and developers in computer vision and generative AI who need precise control over human figures in generated images, offering superior performance in challenging poses, artistic styles, and multi-person scenarios compared to existing methods like ControlNet.

How It Works

HumanSD fine-tunes the Stable Diffusion model using a novel heatmap-guided denoising loss. This approach directly injects skeletal pose information into the diffusion process, strengthening the pose condition during training without causing catastrophic forgetting. This native integration is more efficient and effective than dual-branch diffusion methods.

Quick Start & Requirements

  • Install: Clone the repository, install PyTorch (v1.12.1 recommended), requirements.txt, and MMPose (v0.29.0 recommended).
  • Prerequisites: Python 3.9, PyTorch 1.12.1, CUDA 11.3.
  • Checkpoints: Download HumanSD checkpoints and Stable Diffusion v2.1 checkpoint.
  • Demo: Run python scripts/pose2img.py for command-line demo or python scripts/gradio/pose2img.py for Gradio UI. Comparison with ControlNet and T2I-Adapter requires additional setup and checkpoint downloads.
  • Links: Project Page, Paper, Code, Video, Data

Highlighted Details

  • Fine-tuned Stable Diffusion with a novel heatmap-guided denoising loss.
  • Achieves superior results in challenging poses, artistic styles, and multi-person scenarios.
  • Trained on a custom assembly of three large-scale human-centric datasets.
  • Offers command-line and Gradio demos for quick evaluation.

Maintenance & Community

The project is associated with ICCV 2023. Key contributors are from International Digital Economy Academy and The Chinese University of Hong Kong. The project acknowledges contributions from LAION, DeepFloyd (Stability AI), and OpenCLIP.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, it is based on Stable Diffusion, which typically uses a permissive license. Compatibility for commercial use or closed-source linking would require explicit license verification.

Limitations & Caveats

The README mentions that some code modifications might be necessary for T2I-Adapter integration due to path conflicts. The dataset preparation, especially for Laion-Human, involves complex file structures and requires careful adherence to instructions.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.