Step1X-Edit  by stepfun-ai

Image editing model comparable to closed-source alternatives

created 3 months ago
1,549 stars

Top 27.4% on sourcepulse

GitHubView on GitHub
Project Summary

Step1X-Edit is an open-source image editing model designed to rival closed-source alternatives like GPT-4o and Gemini 2 Flash. It targets researchers and practitioners in AI-powered image manipulation, offering a unified approach to processing user instructions and reference images for high-quality edits.

How It Works

The model employs a Multimodal LLM to interpret reference images and user editing instructions. It extracts a latent embedding, which is then integrated with a diffusion image decoder to generate the target edited image. This approach allows for nuanced understanding of complex editing requests, aiming for performance comparable to leading proprietary systems.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt
  • Prerequisites: Python >= 3.10.0, PyTorch >= 2.2 with CUDA toolkit (tested with CUDA 12.1). Requires flash-attn installation via provided script.
  • Hardware: Recommended 80GB GPU memory for optimal performance. FP8 quantized weights reduce memory requirements to ~18GB.
  • Demo: Online demo available at https://stepfun-ai.github.io/Step1X-Edit/
  • Models: Available on ModelScope and HuggingFace.

Highlighted Details

  • State-of-the-art performance on the GEdit-Bench, a novel benchmark based on real-world user instructions.
  • Offers FP8 quantized weights for reduced memory footprint and faster inference.
  • Supports offloading modules to CPU for further memory savings.
  • ComfyUI plugin available for integration into existing workflows.

Maintenance & Community

The project has seen recent community contributions for ComfyUI integration and FP8 model weight updates. Links to community-provided ComfyUI plugins are available.

Licensing & Compatibility

Licensed under the Apache License 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Fine-tuning scripts and Diffusers integration are not yet released. Multi-GPU sequence parallel inference is also planned but not yet available.

Health Check
Last commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
10
Star History
531 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.