ICT by raywzy

Image completion research paper using transformers

Created 4 years ago

335 stars

Top 81.8% on SourcePulse

Project Summary

This repository provides the official PyTorch implementation of the Image Completion Transformer (ICT) for high-fidelity pluralistic image completion. It is designed for researchers and practitioners in computer vision and generative modeling who need to fill missing regions in images with realistic and diverse content. The ICT leverages transformers for their superior ability to understand shape and geometry compared to traditional CNN-based methods.

How It Works

The ICT employs a two-stage approach. First, a Transformer model generates a coarse, semantically plausible completion in a latent space. This is followed by a guided upsampling network that refines the completion to high resolution, ensuring fidelity and detail. The transformer's attention mechanism is key to capturing long-range dependencies and contextual information, crucial for accurate image completion.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python >=3.6, PyTorch >=1.6, NVIDIA GPU + CUDA, cuDNN.
Pre-trained models must be downloaded separately.
Input images and masks should be .png format and masks binarized.
Model is trained for 256x256 input resolution.
See Project Page for more details.

Highlighted Details

Official PyTorch implementation of ICCV 2021 paper.
Transformer-based architecture for improved shape and geometry understanding.
Pluralistic completion allows for multiple diverse results.
Supports inference and training for both transformer and upsampling components.

Maintenance & Community

The repository is maintained by Ziyu Wan (@Raywzy). Contact is available via email: raywzy@gmail.com.

Licensing & Compatibility

The repository is for academic research use only. No specific license is mentioned, implying potential restrictions on commercial use.

Limitations & Caveats

The model is trained exclusively for 256x256 resolution. Masks require specific formatting (binarized, .png). Pre-trained models are large and require manual download. The "academic research use only" clause may restrict commercial applications.

ICT by raywzy

Explore Similar Projects

VisionLLaMA by Meituan-AutoML

art-msra by microsoft

flux2 by black-forest-labs

RADIO by NVlabs

BLIP3o by JiuhaiChen

Show-o by showlab

clipseg by timojl

sngan_projection by pfnet-research

DALLE-pytorch by lucidrains

guided-diffusion by openai

taming-transformers by CompVis

flux by black-forest-labs