Step1X-Edit by stepfun-ai

Image editing model comparable to closed-source alternatives

Created 10 months ago

2,139 stars

Top 20.6% on SourcePulse

Project Summary

Step1X-Edit is an open-source image editing model designed to rival closed-source alternatives like GPT-4o and Gemini 2 Flash. It targets researchers and practitioners in AI-powered image manipulation, offering a unified approach to processing user instructions and reference images for high-quality edits.

How It Works

The model employs a Multimodal LLM to interpret reference images and user editing instructions. It extracts a latent embedding, which is then integrated with a diffusion image decoder to generate the target edited image. This approach allows for nuanced understanding of complex editing requests, aiming for performance comparable to leading proprietary systems.

Quick Start & Requirements

Installation: pip install -r requirements.txt
Prerequisites: Python >= 3.10.0, PyTorch >= 2.2 with CUDA toolkit (tested with CUDA 12.1). Requires flash-attn installation via provided script.
Hardware: Recommended 80GB GPU memory for optimal performance. FP8 quantized weights reduce memory requirements to ~18GB.
Demo: Online demo available at https://stepfun-ai.github.io/Step1X-Edit/
Models: Available on ModelScope and HuggingFace.

Highlighted Details

State-of-the-art performance on the GEdit-Bench, a novel benchmark based on real-world user instructions.
Offers FP8 quantized weights for reduced memory footprint and faster inference.
Supports offloading modules to CPU for further memory savings.
ComfyUI plugin available for integration into existing workflows.

Maintenance & Community

The project has seen recent community contributions for ComfyUI integration and FP8 model weight updates. Links to community-provided ComfyUI plugins are available.

Step1X-Edit by stepfun-ai

Explore Similar Projects

UltraEdit by HaozheZhao

OneReward by bytedance

OmniGen2 by VectorSpaceLab

ComfyUI-OmniGen by 1038lab

glid-3-xl by Jack000

HiDream-E1 by HiDream-ai

ICEdit by River-Zhang

HunyuanVideo-I2V by Tencent-Hunyuan

ml-mgie by apple

sdnext by vladmandic

instruct-pix2pix by timothybrooks

stable-diffusion by CompVis