Forgedit by witcherofresearch

Text-guided image editor via diffusion model fine-tuning

Created 2 years ago

286 stars

Top 91.7% on SourcePulse

Project Summary

Forgedit offers a novel approach to text-guided image editing, enabling users to modify images based on textual prompts while preserving original image characteristics. It targets researchers and practitioners in computer vision and generative AI, providing a faster and more effective method for image manipulation compared to existing state-of-the-art techniques.

How It Works

Forgedit utilizes a vision-language joint optimization framework built on Stable Diffusion. It introduces a vector projection mechanism in the text embedding space to independently control identity similarity and editing strength. A key innovation is the discovery and exploitation of a UNet property where the encoder handles space/structure and the decoder handles appearance/identity. This insight informs "forgetting mechanisms" designed to mitigate overfitting when fine-tuning diffusion models on single images, thereby enhancing editing capabilities.

Quick Start & Requirements

Install/Run: Uses accelerate launch with Python scripts (e.g., src/sample_forgedit_batch_textencoder.py).
Prerequisites: Python, requirements.txt (includes Diffusers, Stable Diffusion 1.4 or SG161222/Realistic_Vision_V6.0_B1_noVAE, BLIP model), and an NVIDIA GPU (A100/A800 recommended for speed). CUDA 12 is not explicitly mentioned but implied by modern GPU usage.
Setup/Training Time: ~30-40 seconds for 512x512 resolution on an A100 GPU; over a minute for 768x768.
Links:
- TEdBench: TEdBench
- Vanilla Forgedit TEdBench Results: vanilla Forgedit tedbench repository

Highlighted Details

Achieves state-of-the-art results on the TEdBench benchmark, surpassing methods like Imagic.
Offers significantly faster reconstruction times (~30 seconds) compared to previous methods.
Implements multiple "forgetting strategies" (e.g., encoderkv, donotforget) to combat overfitting.
Supports both "vanilla Forgedit" and "DreamBoothForgedit" implementations.

Maintenance & Community

The primary contributor is witcherofresearch.
No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The code is based on Diffusers, which is typically Apache 2.0 licensed. Compatibility for commercial use is not specified.

Limitations & Caveats

The README mentions that results for "DreamBooth+Forgedit" on TEdBench are not provided for quantitative comparison. Specific BLIP model variants are recommended for optimal performance, suggesting potential sensitivity to dependencies.

Forgedit by witcherofresearch

Explore Similar Projects

ImgEdit by PKU-YuanGroup

Awesome-Image-Editing by FudanCVL

OneReward by bytedance

blended-diffusion by omriav

Awesome-Diffusion-Model-Based-Image-Editing-Methods by SiatMMLab

paint-with-words-sd by cloneofsimo

glid-3-xl by Jack000

pico-banana-400k by apple

CrossAttentionControl by bloc97

Step1X-Edit by stepfun-ai

prompt-to-prompt by google

StyleCLIP by orpatashnik