ideogram4 by ideogram-oss

Cutting-edge open-weight image generation model

Created 2 months ago

2,550 stars

Top 17.5% on SourcePulse

Project Summary

Ideogram 4 is an open-weight text-to-image foundation model designed to advance visual intelligence and offer unprecedented user control. It addresses limitations in existing open models by providing state-of-the-art text rendering, explicit layout control, and native high-resolution image generation. This model is targeted at researchers, engineers, and power users seeking to innovate in generative AI, offering a powerful tool for complex visual design tasks.

How It Works

This project utilizes a novel, fully single-stream Diffusion Transformer (DiT) architecture, trained from scratch. Unlike models with separate text and image branches, Ideogram 4 concatenates text and image tokens into a unified sequence processed by a single transformer. It employs Qwen3-VL-8B-Instruct, a vision-language model, as its text encoder, extracting hidden states from multiple intermediate layers to achieve a richer, multi-scale semantic understanding. This approach enables deep cross-modal interaction and facilitates extreme controllability through structured JSON prompting.

Quick Start & Requirements

Installation is straightforward via pip install . or pip install -e . for editable development. Model weights are gated on Hugging Face (ideogram-ai/ideogram-4-nf4 or ideogram-ai/ideogram-4-fp8) and require accepting the license and authenticating via hf auth login or exporting HF_TOKEN. The command-line interface (CLI) uses a "magic prompt" LLM to convert plain text into structured JSON captions; this defaults to Ideogram's hosted API, requiring an IDEOGRAM_API_KEY. Safety screening via Hive requires additional API keys (HIVE_TEXT_MODERATION_KEY, HIVE_VISUAL_MODERATION_KEY). CUDA is supported for nf4 quantization. Online inference is available at ideogram.ai.

Highlighted Details

Performance: Ranks as the top open-weight model on design-focused benchmarks like Design Arena and ContraLabs typography evaluations, and is a top-5 overall lab on LMArena.
Text Rendering: Delivers best-in-class in-image text generation (signage, logos, captions) among open-weight releases, outperforming significantly larger models.
Controllability: Offers extreme control via structured JSON prompts, enabling explicit spatial layout with bounding boxes, color palette conditioning using hex codes, and flexible aspect ratios up to 6:1 at native 2048 resolution.
Architecture: Features a unique single-stream DiT and a vision-language model text encoder for enhanced semantic understanding and cross-modal interaction.

Maintenance & Community

The project is actively developed by Ideogram AI, with the latest release on June 3, 2026. Ideogram AI is actively hiring for research roles focused on next-generation generative models. https://jobs.ashbyhq.com/ideogram

Licensing & Compatibility

The model weights are released under the "Ideogram 4 Non-Commercial" license. This license explicitly restricts usage in commercial applications, limiting its compatibility with closed-source or proprietary software.

Limitations & Caveats

Access to model weights is gated, requiring acceptance of the non-commercial license. Optimal results and full control are dependent on using structured JSON prompts; plain-text prompts may yield less precise outcomes. Full CLI functionality necessitates obtaining and configuring API keys for auxiliary services like prompt expansion and content moderation.

Health Check

Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

264 stars in the last 30 days