rich-text-to-image by songweige

Text-to-image research paper for enhanced generation control

Created 2 years ago

800 stars

Top 44.1% on SourcePulse

Project Summary

This project enables fine-grained control over text-to-image generation by leveraging rich text formatting (font size, color, style, footnotes) to guide diffusion models. It targets researchers and power users seeking to precisely dictate specific attributes of generated images, offering enhanced control beyond standard text prompts.

How It Works

The method first extracts spatial-text associations from a base diffusion model's cross-attention maps. Rich text prompts, encoded into JSON, provide formatting attributes for specific text spans. A novel region-based diffusion process then uses these attributes to render distinct regions with precise control over color, style, and token importance (via font size), resulting in globally coherent images.

Quick Start & Requirements

Install via git clone and conda env create -f environment.yaml, followed by pip install git+https://github.com/openai/CLIP.git.
Requires Python 3.8, PyTorch 1.11, and supports Stable Diffusion v1-5, SDXL, or ANIMAGINE-XL.
Official demo available on HuggingFace Space. An A1111 WebUI extension is also available.

Highlighted Details

Supports LoRA checkpoints and SD-XL models.
Enables precise color rendering using hex codes.
Allows local style control via font attributes (e.g., "style of Claude Monet").
Font size mapping to token reweighting for emphasis.
Footnotes can provide supplementary descriptions for specific regions.

Maintenance & Community

Implemented by Songwei Ge, Taesung Park, Jun-Yan Zhu, and Jia-Bin Huang.
Built upon HuggingFace diffusers and Quill rich-text editor.
Paper accepted by ICCV 2023.

Licensing & Compatibility

The repository does not explicitly state a license. The underlying diffusers library is typically Apache 2.0, but this specific project's license is unstated.

Limitations & Caveats

The project does not specify a license, which may impact commercial use or integration into closed-source projects.
Setup requires specific older versions of PyTorch (1.11), which might conflict with newer environments.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days