BrushEdit  by TencentARC

AI agent for image inpainting and editing

created 7 months ago
572 stars

Top 57.2% on sourcepulse

GitHubView on GitHub
Project Summary

BrushEdit is a unified AI agent for image inpainting and editing, targeting researchers and practitioners in computer vision and generative AI. It offers both automated and interactive editing capabilities, leveraging a pipeline that combines multi-modal large language models (MLLMs) with a dual-branch diffusion inpainting model (BrushNetX) for precise and context-aware image manipulation.

How It Works

BrushEdit employs a four-step pipeline: editing category classification, primary editing object identification, mask and target caption generation, and finally, image inpainting. Steps one through three utilize pre-trained MLLMs and detection models (GroundingDINO, SAM) to interpret user instructions, identify targets, and generate masks and descriptive captions. The core image editing is performed by BrushNetX, an enhanced diffusion model designed for high-fidelity inpainting and background preservation, guided by the generated masks and captions.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies using pip install -e . and pip install -r app/requirements.txt.
  • Prerequisites: CUDA 11.8, PyTorch 2.0.1, Python 3.10.6. Requires downloading pre-trained checkpoints for BrushNetX, Stable Diffusion base models (e.g., RealisticVisionV60B1), GroundingDINO, SAM, and VLM models (e.g., Qwen2-VL-7B-Instruct).
  • Setup: Estimated setup time involves cloning, environment setup, and downloading checkpoints (size not specified).
  • Demo: Run with sh app/run_app.sh.
  • Links: Project Page, Arxiv, Video, Hugging Face Demo, Hugging Face Model.

Highlighted Details

  • Supports interactive mask manipulation (generation, square/circle, invert, dilate/erode, move).
  • Offers automated target prompt generation and manual editing.
  • Includes blending options for preserving original image details.
  • Maximum resolution is 1024px to prevent Out-of-Memory errors.
  • Recommends GPT-4o for reasoning, with Qwen2-VL-7B-Instruct as a secondary option.

Maintenance & Community

  • The project is associated with Tencent ARC, Peking University, The Chinese University of Hong Kong, and Tsinghua University.
  • Contact email: liyaowei01@gmail.com.

Licensing & Compatibility

  • The repository is released under an unspecified license. The README mentions modifications based on diffusers and BrushNet, which have their own licenses. Compatibility for commercial use or closed-source linking is not explicitly stated.

Limitations & Caveats

  • The project is marked as "TPAMI under review," suggesting it may still be in an experimental or pre-publication phase.
  • Specific licensing details for the BrushEdit code itself are not provided in the README, which could impact commercial adoption.
  • Maximum resolution is limited to 1024px.
Health Check
Last commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
21 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
1 more.

EditAnything by sail-sg

0.0%
3k
Image editing research paper using segmentation and diffusion
created 2 years ago
updated 5 months ago
Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
2 more.

glide-text2im by openai

0.1%
4k
Text-conditional image synthesis model from research paper
created 3 years ago
updated 1 year ago
Feedback? Help us improve.