Word-As-Image  by Shiriluz

Research paper implementation for semantic typography

created 2 years ago
1,134 stars

Top 34.5% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides the official implementation for "Word-As-Image for Semantic Typography," a SIGGRAPH 2023 Honorable Mention award-winning technique. It automatically generates stylized typography where letterforms visually represent the word's meaning while maintaining readability, targeting designers and researchers interested in creative text generation.

How It Works

The method leverages large pretrained language-vision models, specifically Stable Diffusion, to distill textual concepts into visual representations. It optimizes the outline of individual letters to convey semantic meaning, guided by the diffusion model. Additional loss functions ensure legibility and preserve the original font's style, resulting in simple, black-and-white designs.

Quick Start & Requirements

  • Installation: Clone the repository, create a conda environment (conda create --name word python=3.8.15), activate it, and install dependencies using pip and conda as specified in the README. This includes PyTorch with CUDA 11.3, diffusers, transformers, diffvg, and various image processing and SVG libraries.
  • Prerequisites: CUDA 11.3, Python 3.8.15, HuggingFace access token for Stable Diffusion.
  • Setup: Requires cloning multiple repositories (main repo, diffvg) and installing numerous Python packages. Estimated setup time is moderate.
  • Usage: Run experiments via run_word_as_image.sh or python code/main.py with arguments like --semantic_concept, --optimized_letter, and --font.
  • Links: Official Implementation, SIGGRAPH Paper

Highlighted Details

  • SIGGRAPH 2023 Honorable Mention Award winner.
  • Integrates Stable Diffusion with Diffvg for vector graphic optimization.
  • Focuses on semantic understanding and creative visualization within letterforms.
  • Generates minimal, flat, 2D vector designs.

Maintenance & Community

The project is associated with its authors from academia. No specific community channels (Discord/Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

Licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). This license prohibits commercial use and requires derivative works to be shared under the same or a compatible license.

Limitations & Caveats

The CC BY-NC-SA 4.0 license strictly prohibits commercial use. The implementation relies on specific versions of PyTorch and CUDA, and requires a HuggingFace token, which may pose adoption barriers. Fine-tuning parameters like Lacap loss weight and low-pass filter sigma are suggested for quality adjustments.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.