hard-prompts-made-easy by YuxinWenRick

Prompt optimizer for text-to-image generation

Created 2 years ago

645 stars

Top 51.7% on SourcePulse

View on GitHub

3 Experts Love This Project

Andreas Jansson

Cofounder of Replicate

Elvis Saravia

Founder of DAIR.AI

Edward Sun

Research Scientist at Meta Superintelligence Lab

Project Summary

This repository provides an official implementation for optimizing "hard prompts" using gradient-based discrete optimization, specifically the PEZ algorithm. It targets researchers and practitioners in prompt engineering and generative AI, enabling the discovery of effective text prompts for image generation models like Stable Diffusion.

How It Works

The core approach involves using CLIP encoders to optimize a discrete "hard prompt" from a given image. This optimized prompt is then fed into Stable Diffusion to generate new images. The PEZ algorithm is designed for efficient and effective discrete optimization, inspired by the PEZ candy dispenser.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt after creating and activating a virtual environment.
Prerequisites: PyTorch >= 1.13.0, transformers >= 4.23.1, diffusers >= 0.11.1, sentence-transformers >= 2.2.2, ftfy >= 6.1.1, mediapy >= 1.1.2.
Usage: python run.py image.png
Demos available on Colab and Hugging Face Space. Examples are in the examples/ folder.

Highlighted Details

Optimizes discrete prompts using the PEZ algorithm and CLIP encoders.
Supports prompt inversion from images.
Configurable parameters include prompt length, learning rate, batch size, and CLIP model choice.
Code for language model prompt experiments is available in prompt_lm/.

Maintenance & Community

Contact: Yuxin (ywen@umd.edu) for questions.
No explicit community links (Discord, Slack) or roadmap are provided in the README.

Licensing & Compatibility

The README does not specify a license.

Limitations & Caveats

The repository's license is not specified, which may impact commercial use or closed-source integration.
Empirical results suggest a prompt length of 16 tokens often yields the most generalizable performance.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days