clip-guided-diffusion by afiaka87

CLI tool for text-to-image generation using CLIP-guided diffusion

Created 4 years ago

462 stars

Top 65.5% on SourcePulse

View on GitHub

4 Experts Love This Project

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Chenlin Meng

Cofounder of Pika

Luis Capelo

Cofounder of Lightning AI

Ben Firshman

Cofounder of Replicate

Project Summary

This repository provides a command-line interface (CLI) and Python module for generating images from text prompts using guided diffusion models and OpenAI's CLIP. It's designed for researchers and artists interested in exploring text-to-image synthesis with fine-grained control over the generation process. The tool allows for image blending, weighted prompts, and various diffusion schedulers, offering a flexible approach to creative AI.

How It Works

The core of the tool leverages Katherine Crowson's guided diffusion models, combined with CLIP for text-image alignment. It iteratively refines an image based on a text prompt, guiding the diffusion process towards a representation that matches the prompt's semantic meaning as interpreted by CLIP. Users can influence the generation by adjusting parameters like clip_guidance_scale, tv_scale (for smoothness), and timestep_respacing (to trade speed for accuracy). It also supports blending with an initial image using perceptual loss.

Quick Start & Requirements

Install: Clone the repository, then clone the guided-diffusion submodule, and install it:

git clone https://github.com/afiaka87/clip-guided-diffusion.git
cd clip-guided-diffusion
git clone https://github.com/crowsonkb/guided-diffusion.git
pip3 install -e guided-diffusion
python3 setup.py install

Run: cgd --prompts "Alien friend by Odilon Redo"
Prerequisites: Python 3, CUDA-enabled GPU recommended (CPU is very slow). Checkpoints are downloaded automatically to ~/.cache/clip-guided-diffusion/.
Docs: Full Usage

Highlighted Details

Supports weighted prompts (e.g., "prompt1:1.0|prompt2:-0.5") for nuanced control.
Allows blending with an existing image (--init_image) and skipping initial timesteps.
Offers experimental support for non-square image generation via width_offset and height_offset.
Integrates with Weights & Biases (wandb) for logging intermediate outputs.

Maintenance & Community

The original author notes a redirection of efforts to pyglide and may be slow to address bugs. Recommendations are made to explore @crowsonkb's v-diffusion-pytorch.

Licensing & Compatibility

The repository itself does not explicitly state a license in the README. The underlying guided-diffusion library by crowsonkb is typically distributed under permissive licenses like MIT, but this should be verified for the specific version used.

Limitations & Caveats

The project is in maintenance mode, with the author focusing on other projects. This may lead to slower bug fixes or feature development. The 64x64 checkpoint requires specific parameter tuning (clip_guidance_scale, tv_scale) due to a different noise scheduler.

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days