Qwen-Image by QwenLM

Image generation model with advanced text rendering

Created 11 months ago

8,099 stars

Top 6.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Junyang Lin

Core Maintainer at Alibaba Qwen

Lysandre Debut

Chief Open-Source Officer at Hugging Face

Project Summary

Qwen-Image is a 20B MMDiT foundation model designed for advanced image generation and editing, with a particular emphasis on high-fidelity text rendering in diverse languages. It targets artists, designers, and researchers seeking precise control over visual content creation, offering capabilities beyond standard image generation.

How It Works

Qwen-Image leverages a Diffusion Model architecture, specifically a 20B MMDiT (Multimodal Diffusion Transformer), to achieve its advanced capabilities. This approach allows for complex interactions between text prompts and visual output, enabling precise control over elements like text rendering, object placement, and stylistic consistency. The model's design prioritizes integrating text seamlessly into images, maintaining typographic integrity and contextual relevance.

Quick Start & Requirements

Install via pip: pip install git+https://github.com/huggingface/diffusers
Requires transformers>=4.51.3.
Supports CUDA for GPU acceleration; falls back to CPU with torch.float32 if unavailable.
Example usage and deployment options are available in the README.

Highlighted Details

Excels at high-fidelity text rendering in both English and Chinese, preserving typographic details and layout.
Supports a wide range of artistic styles, from photorealism to anime.
Enables advanced image editing operations including style transfer, object manipulation, and text editing within images.
Offers image understanding tasks like object detection, semantic segmentation, and depth estimation.

Maintenance & Community

Actively supported by Hugging Face Diffusers, with LoRA and finetuning support in development.
ModelScope provides optimizations and training tools.
Community platforms include Discord and WeChat groups.
Hiring for full-time and intern research positions.

Licensing & Compatibility

Licensed under Apache 2.0.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The editing version of Qwen-Image is stated to be released soon, implying current focus is on generation. The README mentions potential heavy traffic for online demos, suggesting high demand.

Health Check

Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

121 stars in the last 30 days