ICEdit  by River-Zhang

Image editing with LoRA fine-tuning

Created 4 months ago
1,929 stars

Top 22.7% on SourcePulse

GitHubView on GitHub
Project Summary

ICEdit enables state-of-the-art instruction-based image editing using significantly less training data and parameters than prior methods. It targets researchers and users seeking efficient, high-fidelity image manipulation, offering comparable or superior performance to commercial models in identity preservation and instruction following.

How It Works

ICEdit leverages a novel in-context generation approach within a Diffusion Transformer architecture. By training with a drastically reduced dataset (0.5% of prior methods), it achieves remarkable efficiency. This method focuses on precise instruction adherence and identity persistence, outperforming models like GPT-4o in these aspects.

Quick Start & Requirements

  • Install: pip install -r requirements.txt and pip install -U huggingface_hub.
  • Prerequisites: Python 3.10, Conda environment recommended. Pretrained weights for Flux.1-fill-dev and ICEdit-normal-LoRA are required.
  • Hardware: 4GB VRAM is sufficient for ComfyUI-nunchaku workflow. Standard inference on a 512x768 image requires 35GB VRAM, with an option for --enable-model-cpu-offload for 24GB GPUs.
  • Resources: Official Hugging Face demo available. ComfyUI workflows provided. Paper.

Highlighted Details

  • Achieves state-of-the-art instruction-based editing with minimal training data (0.5%) and parameters (1%).
  • Outperforms commercial models like GPT-4o in identity persistence and instruction following.
  • Offers fast inference (approx. 9 seconds per image) and low cost.
  • Supports ComfyUI integration with workflows for both standard LoRA and moe-lora (moe-lora weights temporarily withdrawn).

Maintenance & Community

  • Active development with recent updates and community contributions (e.g., ComfyUI workflows).
  • Hugging Face trending #2 weekly as of May 6, 2025.
  • Chinese tutorial video available.
  • Project Page.

Licensing & Compatibility

  • The repository itself does not explicitly state a license in the README. The BibTeX entry indicates it's an arXiv paper, typically implying research use. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The model is primarily trained on realistic images; performance may degrade on non-realistic styles like anime or blurry pictures. Object removal success rate is noted as relatively lower due to dataset limitations. The original moe-lora weights are temporarily withdrawn due to cooperation issues.

Health Check
Last Commit

6 days ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
44 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.4%
4k
Image synthesis research paper using a linear diffusion transformer
Created 11 months ago
Updated 5 days ago
Feedback? Help us improve.