ImgEdit by PKU-YuanGroup

Created 1 year ago

324 stars

Top 83.7% on SourcePulse

Project Summary

ImgEdit provides a large-scale, high-quality dataset and a comprehensive benchmark suite for image editing tasks, addressing the need for standardized training and evaluation in this domain. It targets AI researchers and developers working on generative models, offering a unified platform for advancing single-turn and multi-turn image manipulation capabilities. The project aims to facilitate the development of more sophisticated and instruction-adherent image editing models.

How It Works

The ImgEdit dataset is curated through a multi-stage pipeline. This process begins with filtering the Laion-aes dataset based on aesthetic scores, followed by dense and short caption generation using vision-language models like Qwen2.5VL-7B and GPT-4o. Object detection and segmentation are performed using YOLO-world and SAM2, with CLIP filtering applied. Diverse editing prompts are generated via GPT-4o, and task-specific editing pipelines, potentially leveraging ComfyUI and Stable Diffusion, are employed. Data quality is further refined through GPT-4o-based filtering. The ImgEdit-Bench benchmark evaluates models across basic, Understanding-Grounding-Editing (UGE), and multi-turn editing suites, assessing instruction adherence, editing quality, and content memory.

Quick Start & Requirements

Dataset Access: Datasets are available on Hugging Face (sysuyy/ImgEdit, sysuyy/ImgEdit_recap_mask) or can be downloaded via huggingface-cli download. Tar packages may require merging (cat a.tar.split.* > a.tar).
Dependencies: Implied dependencies include torch, transformers, and datasets for loading data. Specific model checkpoints (e.g., ImgEdit_Judge) require environment setup following Qwen2.5-VL.
Related Projects: UniWorld-V1 is a related project inheriting editing capabilities, available on GitHub.
Documentation: Links to arXiv papers (2505.20275, 2506.03147) and Hugging Face datasets are provided.

Highlighted Details

Features 1.2 million curated image-edit pairs, encompassing novel and complex single-turn edits, alongside challenging multi-turn tasks.
ImgEdit-Bench includes three suites: Basic, Understanding-Grounding-Editing (UGE), and Multi-Turn, designed for comprehensive evaluation.
The data curation pipeline integrates state-of-the-art models for captioning, segmentation, and editing.
ImgEdit-E1, a model trained on the dataset, demonstrates strong performance, highlighting the dataset's value.

Maintenance & Community

Recent news (July 2025) indicates ongoing updates to the ImgEdit-Bench leaderboard with new model integrations. The project has open-sourced related work like UniWorld-V1. No direct community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

No explicit license information is provided in the README. This omission requires clarification for adoption decisions, especially concerning commercial use or derivative works.

Limitations & Caveats

The release of data curation pipelines is marked as "WIP" (Work In Progress). The ImgEdit_Judge component requires specific environment setup aligned with Qwen2.5-VL, and its usage involves custom inference code. The absence of a stated license is a significant adoption blocker.

Health Check

Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days