clip-interrogator by pharmapsychotic

Image-to-prompt tool for text-to-image models

Created 3 years ago

2,925 stars

Top 16.2% on SourcePulse

View on GitHub

8 Experts Love This Project

Yaowei Zheng

Author of LLaMA-Factory

Jesse Clark

Cofounder of Marqo

Chuan Li

Chief Scientific Officer at Lambda

Omar Sanseviero

DevRel at Google DeepMind

and 4 more!

Project Summary

This tool generates descriptive text prompts for text-to-image models based on input images, aiding users in creating similar artwork. It is designed for artists, designers, and AI enthusiasts looking to leverage existing visuals for new creations.

How It Works

The CLIP Interrogator combines OpenAI's CLIP and Salesforce's BLIP models to analyze an input image and produce optimized text prompts. It leverages pre-trained CLIP models, allowing users to select specific versions (e.g., ViT-L-14/openai for Stable Diffusion 1.X, ViT-H-14/laion2b_s32b_b79k for Stable Diffusion 2.0) for tailored prompt generation.

Quick Start & Requirements

Install via pip: pip install clip-interrogator==0.5.4 (or 0.6.0 for BLIP2 support).
Requires PyTorch with GPU support (e.g., pip3 install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu117).
Default settings require ~6.3GB VRAM; low VRAM settings (~2.7GB) are available.
Official documentation and examples are available in the repository.

Highlighted Details

Supports custom prompt ranking against user-defined term lists.
Offers a Stable Diffusion Web UI Extension for integrated use.
Available on Colab, HuggingFace, and Replicate for easy access.
Configurable with options for CLIP model selection, caching, and VRAM optimization.

Maintenance & Community

The project is actively maintained, with recent updates including BLIP2 support. Community engagement can be found via the project's GitHub repository.

Licensing & Compatibility

The project is released under an unspecified license. Compatibility for commercial use or closed-source linking is not explicitly detailed.

Limitations & Caveats

The specific license details require further investigation for commercial applications. The project is primarily focused on prompt generation for specific text-to-image models and may not cover all image-to-prompt use cases.

Health Check

Last Commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days