ImgEdit  by PKU-YuanGroup

Created 8 months ago
263 stars

Top 97.0% on SourcePulse

GitHubView on GitHub
Project Summary

ImgEdit provides a large-scale, high-quality dataset and a comprehensive benchmark suite for image editing tasks, addressing the need for standardized training and evaluation in this domain. It targets AI researchers and developers working on generative models, offering a unified platform for advancing single-turn and multi-turn image manipulation capabilities. The project aims to facilitate the development of more sophisticated and instruction-adherent image editing models.

How It Works

The ImgEdit dataset is curated through a multi-stage pipeline. This process begins with filtering the Laion-aes dataset based on aesthetic scores, followed by dense and short caption generation using vision-language models like Qwen2.5VL-7B and GPT-4o. Object detection and segmentation are performed using YOLO-world and SAM2, with CLIP filtering applied. Diverse editing prompts are generated via GPT-4o, and task-specific editing pipelines, potentially leveraging ComfyUI and Stable Diffusion, are employed. Data quality is further refined through GPT-4o-based filtering. The ImgEdit-Bench benchmark evaluates models across basic, Understanding-Grounding-Editing (UGE), and multi-turn editing suites, assessing instruction adherence, editing quality, and content memory.

Quick Start & Requirements

  • Dataset Access: Datasets are available on Hugging Face (sysuyy/ImgEdit, sysuyy/ImgEdit_recap_mask) or can be downloaded via huggingface-cli download. Tar packages may require merging (cat a.tar.split.* > a.tar).
  • Dependencies: Implied dependencies include torch, transformers, and datasets for loading data. Specific model checkpoints (e.g., ImgEdit_Judge) require environment setup following Qwen2.5-VL.
  • Related Projects: UniWorld-V1 is a related project inheriting editing capabilities, available on GitHub.
  • Documentation: Links to arXiv papers (2505.20275, 2506.03147) and Hugging Face datasets are provided.

Highlighted Details

  • Features 1.2 million curated image-edit pairs, encompassing novel and complex single-turn edits, alongside challenging multi-turn tasks.
  • ImgEdit-Bench includes three suites: Basic, Understanding-Grounding-Editing (UGE), and Multi-Turn, designed for comprehensive evaluation.
  • The data curation pipeline integrates state-of-the-art models for captioning, segmentation, and editing.
  • ImgEdit-E1, a model trained on the dataset, demonstrates strong performance, highlighting the dataset's value.

Maintenance & Community

Recent news (July 2025) indicates ongoing updates to the ImgEdit-Bench leaderboard with new model integrations. The project has open-sourced related work like UniWorld-V1. No direct community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

No explicit license information is provided in the README. This omission requires clarification for adoption decisions, especially concerning commercial use or derivative works.

Limitations & Caveats

The release of data curation pipelines is marked as "WIP" (Work In Progress). The ImgEdit_Judge component requires specific environment setup aligned with Qwen2.5-VL, and its usage involves custom inference code. The absence of a stated license is a significant adoption blocker.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.