HunyuanImage-2.1  by Tencent-Hunyuan

High-resolution 2K text-to-image generation

Created 2 weeks ago

New!

574 stars

Top 56.3% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> HunyuanImage-2.1 addresses high-resolution (2K) text-to-image generation, offering enhanced text-image alignment and efficiency. Targeted at researchers and power users, it provides a robust solution for generating detailed, semantically accurate images with multilingual prompt support and advanced features like prompt enhancement.

How It Works

This model employs a two-stage diffusion transformer architecture, featuring a 17 billion parameter base model and a refiner. Key innovations include a high-compression VAE (32x) aligned with DINOv2 for efficient 2K image generation, dual text encoders (MLLM and multilingual ByT5) for improved semantic understanding and text rendering, and Reinforcement Learning from Human Feedback (RLHF) for aesthetic refinement. Meanflow distillation is utilized for faster, high-quality sampling.

Quick Start & Requirements

Installation involves cloning the repository (git clone https://github.com/Tencent-Hunyuan/HunyuanImage-2.1.git), navigating into the directory (cd HunyuanImage-2.1), installing dependencies via pip install -r requirements.txt, and pip install flash-attn==2.7.3 --no-build-isolation. Requires Linux, an NVIDIA GPU with CUDA support, and a minimum of 36GB GPU memory (with CPU offloading). An FP8-quantized model for lower memory usage is anticipated. Official repository: https://github.com/Tencent-Hunyuan/HunyuanImage-2.1

Highlighted Details

  • Generates ultra-high-definition (2K) images with cinematic composition.
  • Supports both Chinese and English prompts natively, with glyph-aware text rendering via ByT5.
  • Offers flexible aspect ratio support (1:1, 16:9, 9:16, etc.).
  • Features an automatic prompt rewriting module (PromptEnhancer) for improved descriptive accuracy.
  • Achieves state-of-the-art semantic alignment among open-source models per SSAE evaluation, comparable to closed-source alternatives in GSB benchmarks.

Maintenance & Community

The project acknowledges contributions from Qwen, FLUX, diffusers, and HuggingFace. No specific community channels (e.g., Discord, Slack) or detailed roadmap information were present in the provided text. The release date for inference code and weights is noted as September 8, 2025.

Licensing & Compatibility

No specific license information was provided in the README excerpt.

Limitations & Caveats

The model is restricted to Linux environments and exclusively supports 2K resolution generation; lower resolutions produce artifacts. It demands substantial GPU memory (36GB minimum), although an FP8 version is planned to mitigate this.

Health Check
Last Commit

19 hours ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
30
Star History
577 stars in the last 14 days

Explore Similar Projects

Starred by Deepak Pathak Deepak Pathak(Cofounder of Skild AI; Professor at CMU), Travis Fischer Travis Fischer(Founder of Agentic), and
8 more.

sygil-webui by Sygil-Dev

0.0%
8k
Web UI for Stable Diffusion
Created 3 years ago
Updated 2 months ago
Feedback? Help us improve.