prompt-to-prompt by google

Image editing via prompt manipulation, based on attention control

Created 3 years ago

3,427 stars

Top 14.1% on SourcePulse

5 Experts Love This Project

chenlin9

Cofounder of Pika

omarsar

Founder of DAIR.AI

huybery

Research Scientist at Alibaba Qwen

yoland68

Cofounder of Comfy Org

and 1 more!

Project Summary

This repository provides an implementation of Prompt-to-Prompt, a method for editing images generated by diffusion models like Stable Diffusion and Latent Diffusion. It allows users to modify text prompts to achieve specific image edits, such as replacing objects, refining details, or re-weighting concepts, targeting researchers and practitioners in generative AI.

How It Works

The core mechanism involves intercepting and modifying attention maps within the diffusion model's U-Net architecture during the image generation process. By implementing an AttentionControl abstract class, users can define custom logic to alter attention weights based on prompt edits. This approach enables fine-grained control over how different parts of the prompt influence the generated image, offering advantages in edit specificity and controllability.

Quick Start & Requirements

Install: Requires Python 3.8, PyTorch 1.11, and packages listed in requirements.txt.
Hardware: Tested on Tesla V100 16GB, requires at least 12GB VRAM.
Usage: End-to-end examples are provided in the prompt-to-prompt_ldm and prompt-to-prompt_stable notebooks.

Highlighted Details

Supports three main edit types: Replacement (swapping tokens), Refinement (adding tokens), and Re-weight (adjusting token influence).
Offers control over edit application timing via cross_replace_steps and self_replace_steps.
Includes Null-text Inversion for intuitive text-based editing of real images.
Implemented over Latent Diffusion and Stable Diffusion models.

Maintenance & Community

This is not an officially supported Google product.
Key contributors are listed as authors on the associated arXiv papers.

Licensing & Compatibility

The repository does not explicitly state a license.

Limitations & Caveats

The project is not an officially supported Google product.
Specific VRAM requirements may limit accessibility on consumer hardware.

Health Check

Last Commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)

0

Issues (30d)

0

Star History

9 stars in the last 30 days

Explore Similar Projects

ImgEdit by PKU-YuanGroup

Created 8 months ago

Updated 2 months ago

Awesome-Image-Editing by FudanCVL

Survey of multimodal-guided image editing with diffusion models

Created 1 year ago

Updated 4 months ago

UltraEdit by HaozheZhao

Dataset for instruction-based image editing

Created 1 year ago

Updated 1 year ago

OneReward by bytedance

Unified mask-guided image generation and editing

Created 4 months ago

Updated 3 months ago

Forgedit by witcherofresearch

Text-guided image editor via diffusion model fine-tuning

Created 2 years ago

Updated 1 year ago

BrushEdit by TencentARC

AI agent for image inpainting and editing

Created 1 year ago

Updated 4 months ago

Starred by

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral).

blended-diffusion by omriav

Image editing technique (research paper)

Created 4 years ago

Updated 1 year ago

Awesome-Diffusion-Model-Based-Image-Editing-Methods by SiatMMLab

Survey paper on diffusion model-based image editing methods

Created 2 years ago

Updated 6 months ago

Starred by

Theo Browne

Theo Browne(Founder of Ping.gg).

pico-banana-400k by apple

Dataset advances text-guided image editing capabilities

Created 2 months ago

Updated 3 weeks ago

Starred by

Andreas Jansson

Andreas Jansson(Cofounder of Replicate),

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI), and

1 more.

CrossAttentionControl by bloc97

Image editing via cross-attention control in Stable Diffusion

Created 3 years ago

Updated 3 years ago

Step1X-Edit by stepfun-ai

Image editing model comparable to closed-source alternatives

Created 8 months ago

Updated 1 week ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

FateZero by ChenyangQiQi

Zero-shot video editor (ICCV 2023 Oral) using attention fusion

Created 2 years ago

Updated 2 years ago

Feedback? Help us improve.