HiDream-E1  by HiDream-ai

Image editing model for instruction-based manipulation

created 3 months ago
734 stars

Top 48.2% on SourcePulse

GitHubView on GitHub
Project Summary

HiDream-E1 is an instruction-based image editing model designed for users seeking advanced image manipulation capabilities. It builds upon the HiDream-I1 model, offering enhanced features for transforming images according to textual prompts, targeting researchers and power users in AI-driven creative workflows.

How It Works

HiDream-E1 leverages a diffusion model architecture, incorporating Llama-3.1-8B-Instruct for instruction understanding and an optional transformer model (HiDream-I1-Full) for instruction refinement. The editing process involves a two-stage denoising approach: the initial phase performs the core editing based on the prompt, while the latter phase uses the refinement model to enhance the final output, controlled by a refine_strength parameter. This dual-stage process aims to provide both precise editing and high-quality visual refinement.

Quick Start & Requirements

  • Install: pip install -r requirements.txt, pip install -U flash-attn --no-build-isolation, pip install -U git+https://github.com/huggingface/diffusers.git
  • Prerequisites: CUDA 12.4 recommended, Flash Attention, Hugging Face Hub login for Llama-3.1-8B-Instruct model access.
  • Usage: Run inference via python ./inference.py or integrate into custom code. A Gradio demo is available via python gradio_demo.py.
  • Links: HuggingFace repo

Highlighted Details

  • Achieves a score of 6.40 on EmuEdit Average and 7.54 on ReasonEdit benchmarks.
  • Supports instruction refinement using a VLM API key (local vllm or OpenAI API).
  • Offers a refine_strength parameter to balance editing and refinement stages.
  • Models are under active development with frequent updates.

Maintenance & Community

  • The project is open-sourced by HiDream-ai.
  • Further information and updates can be found at https://vivago.ai/.

Licensing & Compatibility

  • The code and models are licensed under the MIT License.
  • Permissive licensing allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The model and code are explicitly stated to be under development and subject to frequent updates, which may introduce breaking changes. Instruction refinement requires a VLM API key, adding an external dependency.

Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
6
Star History
321 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.