ICEdit by River-Zhang

Image editing with LoRA fine-tuning

Created 10 months ago

2,079 stars

Top 21.0% on SourcePulse

Project Summary

ICEdit enables state-of-the-art instruction-based image editing using significantly less training data and parameters than prior methods. It targets researchers and users seeking efficient, high-fidelity image manipulation, offering comparable or superior performance to commercial models in identity preservation and instruction following.

How It Works

ICEdit leverages a novel in-context generation approach within a Diffusion Transformer architecture. By training with a drastically reduced dataset (0.5% of prior methods), it achieves remarkable efficiency. This method focuses on precise instruction adherence and identity persistence, outperforming models like GPT-4o in these aspects.

Quick Start & Requirements

Install: pip install -r requirements.txt and pip install -U huggingface_hub.
Prerequisites: Python 3.10, Conda environment recommended. Pretrained weights for Flux.1-fill-dev and ICEdit-normal-LoRA are required.
Hardware: 4GB VRAM is sufficient for ComfyUI-nunchaku workflow. Standard inference on a 512x768 image requires 35GB VRAM, with an option for --enable-model-cpu-offload for 24GB GPUs.
Resources: Official Hugging Face demo available. ComfyUI workflows provided. Paper.

Highlighted Details

Achieves state-of-the-art instruction-based editing with minimal training data (0.5%) and parameters (1%).
Outperforms commercial models like GPT-4o in identity persistence and instruction following.
Offers fast inference (approx. 9 seconds per image) and low cost.
Supports ComfyUI integration with workflows for both standard LoRA and moe-lora (moe-lora weights temporarily withdrawn).

Maintenance & Community

Active development with recent updates and community contributions (e.g., ComfyUI workflows).
Hugging Face trending #2 weekly as of May 6, 2025.
Chinese tutorial video available.
Project Page.

Licensing & Compatibility

The repository itself does not explicitly state a license in the README. The BibTeX entry indicates it's an arXiv paper, typically implying research use. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The model is primarily trained on realistic images; performance may degrade on non-realistic styles like anime or blurry pictures. Object removal success rate is noted as relatively lower due to dataset limitations. The original moe-lora weights are temporarily withdrawn due to cooperation issues.

Health Check

Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

14 stars in the last 30 days