PIDM  by ankanbhunia

Research paper for person image synthesis using denoising diffusion

created 2 years ago
497 stars

Top 63.3% on sourcepulse

GitHubView on GitHub
Project Summary

PIDM (Person Image Synthesis via Denoising Diffusion Model) addresses the challenge of generating realistic human images conditioned on pose and appearance. It is targeted at researchers and developers in computer vision and generative AI, offering a novel diffusion-based approach for high-fidelity person image synthesis.

How It Works

PIDM utilizes a denoising diffusion probabilistic model (DDPM) framework. The core innovation lies in its conditioning mechanism, which effectively integrates both target pose and reference appearance information into the diffusion process. This allows for precise control over the generated output, enabling users to synthesize new person images that match specified poses and visual styles.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies via pip install -r requirements.txt.
  • Prerequisites: PyTorch with CUDA 11.7, Python 3.7.
  • Dataset: Requires the DeepFashion dataset, processed into LMDB format. Pose information extracted with OpenPose is also necessary.
  • Pretrained Model: Download from a provided Google Drive link.
  • Demo: A Google Colab notebook is available for quick experimentation.

Highlighted Details

  • Achieves state-of-the-art results compared to methods like ADGAN, PISE, GFLA, DPTN, CASD, and NTED.
  • Supports both pose and appearance control for flexible image generation.
  • Training is resource-intensive, requiring approximately 5 days on 8 A100 GPUs for 300 epochs.

Maintenance & Community

The project is associated with several researchers from Google Scholar profiles, indicating a strong academic backing. No specific community channels (like Discord or Slack) are mentioned.

Licensing & Compatibility

The repository does not explicitly state a license. However, the inclusion of academic citations suggests it is intended for research purposes. Commercial use would require clarification.

Limitations & Caveats

The project requires a specific older version of PyTorch with CUDA 11.7, which may pose compatibility challenges with newer hardware or software stacks. The dataset preparation involves downloading from multiple sources and requires a password from dataset maintainers.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
3 more.

guided-diffusion by openai

0.2%
7k
Image synthesis codebase for diffusion models
created 4 years ago
updated 1 year ago
Feedback? Help us improve.