HiDream-I1  by HiDream-ai

Image generation model for fast, high-quality results

created 3 months ago
2,380 stars

Top 19.7% on sourcepulse

GitHubView on GitHub
Project Summary

HiDream-I1 is a 17B parameter open-source image generation model designed for high-quality, fast image synthesis. It targets researchers and developers seeking state-of-the-art text-to-image capabilities, offering multiple model versions for varying speed and quality trade-offs.

How It Works

HiDream-I1 utilizes a diffusion model architecture, leveraging the Llama-3.1-8B-Instruct model as its text encoder. This approach allows for strong semantic understanding and control over image generation. The model offers distilled versions (Dev and Fast) for reduced inference steps and faster generation times, while the full version provides maximum quality.

Quick Start & Requirements

  • Install: pip install -r requirements.txt followed by pip install -U flash-attn --no-build-isolation.
  • Prerequisites: CUDA 12.4 recommended, Flash Attention. Requires agreeing to Llama-3.1-8B-Instruct license and logging in via huggingface-cli login.
  • Usage: Run inference via ./inference.py --model_type {full, dev, fast}.
  • Diffusers Integration: Install from source (pip install git+https://github.com/huggingface/diffusers.git) for seamless integration.
  • Demo: Available via python gradio_demo.py.
  • Docs: https://huggingface.co/spaces/HiDream-ai/HiDream-I1-Dev

Highlighted Details

  • Achieves state-of-the-art results on DPG-Bench (85.89 overall), GenEval (0.83 overall), and HPSv2.1 benchmark (33.82 averaged).
  • Offers three model variants: Full (50 steps), Dev (28 steps), and Fast (16 steps).
  • Open-sourced instruction-based image editing model HiDream-E1-Full available.
  • Integrated into Hugging Face's diffusers library.

Maintenance & Community

  • Active development with recent updates in April 2025.
  • Hugging Face Spaces available for direct interaction.

Licensing & Compatibility

  • Licensed under the MIT License for both code and models.
  • Compatible with commercial use and closed-source linking due to permissive MIT license.

Limitations & Caveats

The model requires significant GPU resources for inference, particularly the full version. Automatic model downloading depends on Hugging Face account access and agreement to Llama-3.1-8B-Instruct license terms.

Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
5
Star History
514 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.