HiDream-E1  by HiDream-ai

Image editing model for instruction-based manipulation

Created 6 months ago
778 stars

Top 45.0% on SourcePulse

GitHubView on GitHub
Project Summary

HiDream-E1 is an instruction-based image editing model designed for users seeking advanced image manipulation capabilities. It builds upon the HiDream-I1 model, offering enhanced features for transforming images according to textual prompts, targeting researchers and power users in AI-driven creative workflows.

How It Works

HiDream-E1 leverages a diffusion model architecture, incorporating Llama-3.1-8B-Instruct for instruction understanding and an optional transformer model (HiDream-I1-Full) for instruction refinement. The editing process involves a two-stage denoising approach: the initial phase performs the core editing based on the prompt, while the latter phase uses the refinement model to enhance the final output, controlled by a refine_strength parameter. This dual-stage process aims to provide both precise editing and high-quality visual refinement.

Quick Start & Requirements

  • Install: pip install -r requirements.txt, pip install -U flash-attn --no-build-isolation, pip install -U git+https://github.com/huggingface/diffusers.git
  • Prerequisites: CUDA 12.4 recommended, Flash Attention, Hugging Face Hub login for Llama-3.1-8B-Instruct model access.
  • Usage: Run inference via python ./inference.py or integrate into custom code. A Gradio demo is available via python gradio_demo.py.
  • Links: HuggingFace repo

Highlighted Details

  • Achieves a score of 6.40 on EmuEdit Average and 7.54 on ReasonEdit benchmarks.
  • Supports instruction refinement using a VLM API key (local vllm or OpenAI API).
  • Offers a refine_strength parameter to balance editing and refinement stages.
  • Models are under active development with frequent updates.

Maintenance & Community

  • The project is open-sourced by HiDream-ai.
  • Further information and updates can be found at https://vivago.ai/.

Licensing & Compatibility

  • The code and models are licensed under the MIT License.
  • Permissive licensing allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The model and code are explicitly stated to be under development and subject to frequent updates, which may introduce breaking changes. Instruction refinement requires a VLM API key, adding an external dependency.

Health Check
Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.