ERNIE-Image by baidu

Advanced text-to-image generation model

Created 3 months ago

493 stars

Top 62.0% on SourcePulse

Project Summary

Summary

ERNIE-Image is an open-weight, text-to-image generation model from Baidu, achieving state-of-the-art performance with a compact 8B Diffusion Transformer (DiT) architecture. It targets researchers and developers, enabling high-quality image synthesis on consumer hardware, with strengths in text-heavy visuals, complex instruction following, and structured content generation.

How It Works

The core architecture features a single-stream Diffusion Transformer (DiT) comprising 8 billion parameters. It is enhanced by a lightweight Prompt Enhancer (PE) that expands brief user inputs into richer, structured descriptions. This synergistic approach allows ERNIE-Image to rival larger models, particularly for precise text rendering and intricate scene composition.

Quick Start & Requirements

Primary install via Hugging Face diffusers: pip install git+https://github.com/huggingface/diffusers then pip install -e . in the cloned repo.
Prerequisites: CUDA, torch_dtype=torch.bfloat16.
Hardware: Consumer GPUs with 24GB VRAM.
Links: Huggingface Demo, AI Studio Demo, Blog, Discord, X.

Highlighted Details

Compact Scale: State-of-the-art performance among open-weight models with 8B DiT parameters, outperforming larger models.
Text Rendering: Excels in dense, long-form, layout-sensitive text for posters, infographics, and UI elements.
Instruction Following: Reliably handles complex prompts with multiple objects, detailed relationships, and knowledge-intensive descriptions.
Structured Generation: Effective for comics, storyboards, and multi-panel compositions.
Deployment: Practical on consumer GPUs with 24GB VRAM.
Versions: Offers ERNIE-Image (50 steps, CFG 4.0) and ERNIE-Image-Turbo (8 steps, CFG 1.0) for faster generation.

Maintenance & Community

Community channels: WeChat, Discord, X.
Contact: wenxin-all@baidu.com.
Supports ComfyUI integration and Unsloth for GGUF weights.

Licensing & Compatibility

Licensed under the Apache License 2.0.
This permissive license is generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The README does not explicitly detail limitations, alpha status, or known bugs. Performance metrics show variations between ERNIE-Image (w/ PE) and ERNIE-Image (w/o PE), highlighting the Prompt Enhancer's significant impact on certain benchmarks.

ERNIE-Image by baidu

Explore Similar Projects

Ovis-Image by ATH-MaaS

UltraPixel by catcathh

f-lite by fal-ai

karlo by kakaobrain

long_stable_diffusion by sharonzhou

kandinsky-5 by kandinskylab

clip-guided-diffusion by afiaka87

GLM-Image by zai-org

StableCascade by Stability-AI

Qwen-Image by QwenLM

IF by deep-floyd

stable-diffusion by CompVis