HiDream-E1 by HiDream-ai

Image editing model for instruction-based manipulation

Created 7 months ago

779 stars

Top 44.9% on SourcePulse

Project Summary

HiDream-E1 is an instruction-based image editing model designed for users seeking advanced image manipulation capabilities. It builds upon the HiDream-I1 model, offering enhanced features for transforming images according to textual prompts, targeting researchers and power users in AI-driven creative workflows.

How It Works

HiDream-E1 leverages a diffusion model architecture, incorporating Llama-3.1-8B-Instruct for instruction understanding and an optional transformer model (HiDream-I1-Full) for instruction refinement. The editing process involves a two-stage denoising approach: the initial phase performs the core editing based on the prompt, while the latter phase uses the refinement model to enhance the final output, controlled by a refine_strength parameter. This dual-stage process aims to provide both precise editing and high-quality visual refinement.

Quick Start & Requirements

Install: pip install -r requirements.txt, pip install -U flash-attn --no-build-isolation, pip install -U git+https://github.com/huggingface/diffusers.git
Prerequisites: CUDA 12.4 recommended, Flash Attention, Hugging Face Hub login for Llama-3.1-8B-Instruct model access.
Usage: Run inference via python ./inference.py or integrate into custom code. A Gradio demo is available via python gradio_demo.py.
Links: HuggingFace repo

Highlighted Details

Achieves a score of 6.40 on EmuEdit Average and 7.54 on ReasonEdit benchmarks.
Supports instruction refinement using a VLM API key (local vllm or OpenAI API).
Offers a refine_strength parameter to balance editing and refinement stages.
Models are under active development with frequent updates.

Maintenance & Community

The project is open-sourced by HiDream-ai.
Further information and updates can be found at https://vivago.ai/.

Licensing & Compatibility

The code and models are licensed under the MIT License.
Permissive licensing allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The model and code are explicitly stated to be under development and subject to frequent updates, which may introduce breaking changes. Instruction refinement requires a VLM API key, adding an external dependency.

Health Check

Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days