HiDream-I1 by HiDream-ai

Image generation model for fast, high-quality results

Created 10 months ago

2,500 stars

Top 18.2% on SourcePulse

Project Summary

HiDream-I1 is a 17B parameter open-source image generation model designed for high-quality, fast image synthesis. It targets researchers and developers seeking state-of-the-art text-to-image capabilities, offering multiple model versions for varying speed and quality trade-offs.

How It Works

HiDream-I1 utilizes a diffusion model architecture, leveraging the Llama-3.1-8B-Instruct model as its text encoder. This approach allows for strong semantic understanding and control over image generation. The model offers distilled versions (Dev and Fast) for reduced inference steps and faster generation times, while the full version provides maximum quality.

Quick Start & Requirements

Install: pip install -r requirements.txt followed by pip install -U flash-attn --no-build-isolation.
Prerequisites: CUDA 12.4 recommended, Flash Attention. Requires agreeing to Llama-3.1-8B-Instruct license and logging in via huggingface-cli login.
Usage: Run inference via ./inference.py --model_type {full, dev, fast}.
Diffusers Integration: Install from source (pip install git+https://github.com/huggingface/diffusers.git) for seamless integration.
Demo: Available via python gradio_demo.py.
Docs: https://huggingface.co/spaces/HiDream-ai/HiDream-I1-Dev

Highlighted Details

Achieves state-of-the-art results on DPG-Bench (85.89 overall), GenEval (0.83 overall), and HPSv2.1 benchmark (33.82 averaged).
Offers three model variants: Full (50 steps), Dev (28 steps), and Fast (16 steps).
Open-sourced instruction-based image editing model HiDream-E1-Full available.
Integrated into Hugging Face's diffusers library.

Maintenance & Community

Active development with recent updates in April 2025.
Hugging Face Spaces available for direct interaction.

Licensing & Compatibility

Licensed under the MIT License for both code and models.
Compatible with commercial use and closed-source linking due to permissive MIT license.

Limitations & Caveats

The model requires significant GPU resources for inference, particularly the full version. Automatic model downloading depends on Hugging Face account access and agreement to Llama-3.1-8B-Instruct license terms.

Health Check

Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days