PIDM by ankanbhunia

Research paper for person image synthesis using denoising diffusion

Created 3 years ago

501 stars

Top 62.1% on SourcePulse

Project Summary

PIDM (Person Image Synthesis via Denoising Diffusion Model) addresses the challenge of generating realistic human images conditioned on pose and appearance. It is targeted at researchers and developers in computer vision and generative AI, offering a novel diffusion-based approach for high-fidelity person image synthesis.

How It Works

PIDM utilizes a denoising diffusion probabilistic model (DDPM) framework. The core innovation lies in its conditioning mechanism, which effectively integrates both target pose and reference appearance information into the diffusion process. This allows for precise control over the generated output, enabling users to synthesize new person images that match specified poses and visual styles.

Quick Start & Requirements

Installation: Clone the repository and install dependencies via pip install -r requirements.txt.
Prerequisites: PyTorch with CUDA 11.7, Python 3.7.
Dataset: Requires the DeepFashion dataset, processed into LMDB format. Pose information extracted with OpenPose is also necessary.
Pretrained Model: Download from a provided Google Drive link.
Demo: A Google Colab notebook is available for quick experimentation.

Highlighted Details

Achieves state-of-the-art results compared to methods like ADGAN, PISE, GFLA, DPTN, CASD, and NTED.
Supports both pose and appearance control for flexible image generation.
Training is resource-intensive, requiring approximately 5 days on 8 A100 GPUs for 300 epochs.

Maintenance & Community

The project is associated with several researchers from Google Scholar profiles, indicating a strong academic backing. No specific community channels (like Discord or Slack) are mentioned.

Licensing & Compatibility

The repository does not explicitly state a license. However, the inclusion of academic citations suggests it is intended for research purposes. Commercial use would require clarification.

Limitations & Caveats

The project requires a specific older version of PyTorch with CUDA 11.7, which may pose compatibility challenges with newer hardware or software stacks. The dataset preparation involves downloading from multiple sources and requires a password from dataset maintainers.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days