UltraEdit by HaozheZhao

Dataset for instruction-based image editing

Created 2 years ago

272 stars

Top 94.6% on SourcePulse

Project Summary

UltraEdit is a large-scale dataset and framework for instruction-based image editing, targeting researchers and developers in generative AI. It addresses limitations in existing datasets by offering a broader range of editing instructions, utilizing real-world images for greater diversity, and supporting region-based editing, thereby enabling state-of-the-art performance on image editing benchmarks.

How It Works

UltraEdit leverages large language models (LLMs) and human-curated examples to generate diverse editing instructions. It incorporates real photographs and artworks as data anchors, reducing bias compared to purely synthetic datasets. The framework supports region-based editing through automatically generated, high-quality masks, enhancing fine-grained control. This approach aims to produce massive, high-quality image editing samples for training diffusion models.

Quick Start & Requirements

Install by running pip install -r requirements and pip install -e . within the diffusers directory.
Requires Python, PyTorch, and the diffusers library.
Training scripts are provided for Stable Diffusion 3, Stable Diffusion XL, and Stable Diffusion 1.5.
GPU with CUDA is recommended for training and inference.
Example inference code is provided using StableDiffusion3InstructPix2PixPipeline.

Highlighted Details

Dataset comprises ~4 million automatically generated editing samples.
Features a broader range of editing instructions via LLMs and human examples.
Utilizes real images (photographs, artworks) for increased diversity and reduced bias.
Supports region-based editing with high-quality, automatic mask annotations.
Trained models set new records on various image editing benchmarks.

Maintenance & Community

No specific community channels (Discord/Slack) or roadmap are mentioned in the README. The project is associated with authors from various institutions, indicating academic backing.

Licensing & Compatibility

The README does not explicitly state a license. The code is presented as part of an academic research effort, and usage for commercial purposes would require clarification of licensing terms.

Limitations & Caveats

The project appears to be research-oriented, and the dataset generation process relies heavily on LLMs, which may introduce subtle biases or artifacts. Specific hardware requirements for training large models are not detailed beyond the need for GPUs.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days