clip-guided-diffusion  by afiaka87

CLI tool for text-to-image generation using CLIP-guided diffusion

created 4 years ago
462 stars

Top 66.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a command-line interface (CLI) and Python module for generating images from text prompts using guided diffusion models and OpenAI's CLIP. It's designed for researchers and artists interested in exploring text-to-image synthesis with fine-grained control over the generation process. The tool allows for image blending, weighted prompts, and various diffusion schedulers, offering a flexible approach to creative AI.

How It Works

The core of the tool leverages Katherine Crowson's guided diffusion models, combined with CLIP for text-image alignment. It iteratively refines an image based on a text prompt, guiding the diffusion process towards a representation that matches the prompt's semantic meaning as interpreted by CLIP. Users can influence the generation by adjusting parameters like clip_guidance_scale, tv_scale (for smoothness), and timestep_respacing (to trade speed for accuracy). It also supports blending with an initial image using perceptual loss.

Quick Start & Requirements

  • Install: Clone the repository, then clone the guided-diffusion submodule, and install it:
    git clone https://github.com/afiaka87/clip-guided-diffusion.git
    cd clip-guided-diffusion
    git clone https://github.com/crowsonkb/guided-diffusion.git
    pip3 install -e guided-diffusion
    python3 setup.py install
    
  • Run: cgd --prompts "Alien friend by Odilon Redo"
  • Prerequisites: Python 3, CUDA-enabled GPU recommended (CPU is very slow). Checkpoints are downloaded automatically to ~/.cache/clip-guided-diffusion/.
  • Docs: Full Usage

Highlighted Details

  • Supports weighted prompts (e.g., "prompt1:1.0|prompt2:-0.5") for nuanced control.
  • Allows blending with an existing image (--init_image) and skipping initial timesteps.
  • Offers experimental support for non-square image generation via width_offset and height_offset.
  • Integrates with Weights & Biases (wandb) for logging intermediate outputs.

Maintenance & Community

The original author notes a redirection of efforts to pyglide and may be slow to address bugs. Recommendations are made to explore @crowsonkb's v-diffusion-pytorch.

Licensing & Compatibility

The repository itself does not explicitly state a license in the README. The underlying guided-diffusion library by crowsonkb is typically distributed under permissive licenses like MIT, but this should be verified for the specific version used.

Limitations & Caveats

The project is in maintenance mode, with the author focusing on other projects. This may lead to slower bug fixes or feature development. The 64x64 checkpoint requires specific parameter tuning (clip_guidance_scale, tv_scale) due to a different noise scheduler.

Health Check
Last commit

3 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
2 more.

glide-text2im by openai

0.1%
4k
Text-conditional image synthesis model from research paper
created 3 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.