rich-text-to-image  by songweige

Text-to-image research paper for enhanced generation control

created 2 years ago
793 stars

Top 45.2% on sourcepulse

GitHubView on GitHub
Project Summary

This project enables fine-grained control over text-to-image generation by leveraging rich text formatting (font size, color, style, footnotes) to guide diffusion models. It targets researchers and power users seeking to precisely dictate specific attributes of generated images, offering enhanced control beyond standard text prompts.

How It Works

The method first extracts spatial-text associations from a base diffusion model's cross-attention maps. Rich text prompts, encoded into JSON, provide formatting attributes for specific text spans. A novel region-based diffusion process then uses these attributes to render distinct regions with precise control over color, style, and token importance (via font size), resulting in globally coherent images.

Quick Start & Requirements

  • Install via git clone and conda env create -f environment.yaml, followed by pip install git+https://github.com/openai/CLIP.git.
  • Requires Python 3.8, PyTorch 1.11, and supports Stable Diffusion v1-5, SDXL, or ANIMAGINE-XL.
  • Official demo available on HuggingFace Space. An A1111 WebUI extension is also available.

Highlighted Details

  • Supports LoRA checkpoints and SD-XL models.
  • Enables precise color rendering using hex codes.
  • Allows local style control via font attributes (e.g., "style of Claude Monet").
  • Font size mapping to token reweighting for emphasis.
  • Footnotes can provide supplementary descriptions for specific regions.

Maintenance & Community

  • Implemented by Songwei Ge, Taesung Park, Jun-Yan Zhu, and Jia-Bin Huang.
  • Built upon HuggingFace diffusers and Quill rich-text editor.
  • Paper accepted by ICCV 2023.

Licensing & Compatibility

  • The repository does not explicitly state a license. The underlying diffusers library is typically Apache 2.0, but this specific project's license is unstated.

Limitations & Caveats

  • The project does not specify a license, which may impact commercial use or integration into closed-source projects.
  • Setup requires specific older versions of PyTorch (1.11), which might conflict with newer environments.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.