HiDream-I1  by HiDream-ai

Image generation model for fast, high-quality results

Created 5 months ago
2,423 stars

Top 19.1% on SourcePulse

GitHubView on GitHub
Project Summary

HiDream-I1 is a 17B parameter open-source image generation model designed for high-quality, fast image synthesis. It targets researchers and developers seeking state-of-the-art text-to-image capabilities, offering multiple model versions for varying speed and quality trade-offs.

How It Works

HiDream-I1 utilizes a diffusion model architecture, leveraging the Llama-3.1-8B-Instruct model as its text encoder. This approach allows for strong semantic understanding and control over image generation. The model offers distilled versions (Dev and Fast) for reduced inference steps and faster generation times, while the full version provides maximum quality.

Quick Start & Requirements

  • Install: pip install -r requirements.txt followed by pip install -U flash-attn --no-build-isolation.
  • Prerequisites: CUDA 12.4 recommended, Flash Attention. Requires agreeing to Llama-3.1-8B-Instruct license and logging in via huggingface-cli login.
  • Usage: Run inference via ./inference.py --model_type {full, dev, fast}.
  • Diffusers Integration: Install from source (pip install git+https://github.com/huggingface/diffusers.git) for seamless integration.
  • Demo: Available via python gradio_demo.py.
  • Docs: https://huggingface.co/spaces/HiDream-ai/HiDream-I1-Dev

Highlighted Details

  • Achieves state-of-the-art results on DPG-Bench (85.89 overall), GenEval (0.83 overall), and HPSv2.1 benchmark (33.82 averaged).
  • Offers three model variants: Full (50 steps), Dev (28 steps), and Fast (16 steps).
  • Open-sourced instruction-based image editing model HiDream-E1-Full available.
  • Integrated into Hugging Face's diffusers library.

Maintenance & Community

  • Active development with recent updates in April 2025.
  • Hugging Face Spaces available for direct interaction.

Licensing & Compatibility

  • Licensed under the MIT License for both code and models.
  • Compatible with commercial use and closed-source linking due to permissive MIT license.

Limitations & Caveats

The model requires significant GPU resources for inference, particularly the full version. Automatic model downloading depends on Hugging Face account access and agreement to Llama-3.1-8B-Instruct license terms.

Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
42 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.4%
4k
Image synthesis research paper using a linear diffusion transformer
Created 11 months ago
Updated 5 days ago
Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

IP-Adapter by tencent-ailab

0.3%
6k
Adapter for image prompt in text-to-image diffusion models
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
11 more.

IF by deep-floyd

0.0%
8k
Text-to-image model for photorealistic synthesis and language understanding
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.