prompt-to-prompt  by google

Image editing via prompt manipulation, based on attention control

created 2 years ago
3,362 stars

Top 14.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an implementation of Prompt-to-Prompt, a method for editing images generated by diffusion models like Stable Diffusion and Latent Diffusion. It allows users to modify text prompts to achieve specific image edits, such as replacing objects, refining details, or re-weighting concepts, targeting researchers and practitioners in generative AI.

How It Works

The core mechanism involves intercepting and modifying attention maps within the diffusion model's U-Net architecture during the image generation process. By implementing an AttentionControl abstract class, users can define custom logic to alter attention weights based on prompt edits. This approach enables fine-grained control over how different parts of the prompt influence the generated image, offering advantages in edit specificity and controllability.

Quick Start & Requirements

  • Install: Requires Python 3.8, PyTorch 1.11, and packages listed in requirements.txt.
  • Hardware: Tested on Tesla V100 16GB, requires at least 12GB VRAM.
  • Usage: End-to-end examples are provided in the prompt-to-prompt_ldm and prompt-to-prompt_stable notebooks.

Highlighted Details

  • Supports three main edit types: Replacement (swapping tokens), Refinement (adding tokens), and Re-weight (adjusting token influence).
  • Offers control over edit application timing via cross_replace_steps and self_replace_steps.
  • Includes Null-text Inversion for intuitive text-based editing of real images.
  • Implemented over Latent Diffusion and Stable Diffusion models.

Maintenance & Community

  • This is not an officially supported Google product.
  • Key contributors are listed as authors on the associated arXiv papers.

Licensing & Compatibility

  • The repository does not explicitly state a license.

Limitations & Caveats

  • The project is not an officially supported Google product.
  • Specific VRAM requirements may limit accessibility on consumer hardware.
Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
79 stars in the last 90 days

Explore Similar Projects

Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.