Discover and explore top open-source AI tools and projects—updated daily.
baiduAdvanced text-to-image generation model
New!
Top 73.6% on SourcePulse
Summary
ERNIE-Image is an open-weight, text-to-image generation model from Baidu, achieving state-of-the-art performance with a compact 8B Diffusion Transformer (DiT) architecture. It targets researchers and developers, enabling high-quality image synthesis on consumer hardware, with strengths in text-heavy visuals, complex instruction following, and structured content generation.
How It Works
The core architecture features a single-stream Diffusion Transformer (DiT) comprising 8 billion parameters. It is enhanced by a lightweight Prompt Enhancer (PE) that expands brief user inputs into richer, structured descriptions. This synergistic approach allows ERNIE-Image to rival larger models, particularly for precise text rendering and intricate scene composition.
Quick Start & Requirements
diffusers: pip install git+https://github.com/huggingface/diffusers then pip install -e . in the cloned repo.torch_dtype=torch.bfloat16.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README does not explicitly detail limitations, alpha status, or known bugs. Performance metrics show variations between ERNIE-Image (w/ PE) and ERNIE-Image (w/o PE), highlighting the Prompt Enhancer's significant impact on certain benchmarks.
1 week ago
Inactive
kakaobrain
sharonzhou
afiaka87
QwenLM
Stability-AI
deep-floyd
CompVis