CLI tool for text-to-image generation using CLIP-guided diffusion
Top 66.5% on sourcepulse
This repository provides a command-line interface (CLI) and Python module for generating images from text prompts using guided diffusion models and OpenAI's CLIP. It's designed for researchers and artists interested in exploring text-to-image synthesis with fine-grained control over the generation process. The tool allows for image blending, weighted prompts, and various diffusion schedulers, offering a flexible approach to creative AI.
How It Works
The core of the tool leverages Katherine Crowson's guided diffusion models, combined with CLIP for text-image alignment. It iteratively refines an image based on a text prompt, guiding the diffusion process towards a representation that matches the prompt's semantic meaning as interpreted by CLIP. Users can influence the generation by adjusting parameters like clip_guidance_scale
, tv_scale
(for smoothness), and timestep_respacing
(to trade speed for accuracy). It also supports blending with an initial image using perceptual loss.
Quick Start & Requirements
guided-diffusion
submodule, and install it:
git clone https://github.com/afiaka87/clip-guided-diffusion.git
cd clip-guided-diffusion
git clone https://github.com/crowsonkb/guided-diffusion.git
pip3 install -e guided-diffusion
python3 setup.py install
cgd --prompts "Alien friend by Odilon Redo"
~/.cache/clip-guided-diffusion/
.Highlighted Details
"prompt1:1.0|prompt2:-0.5"
) for nuanced control.--init_image
) and skipping initial timesteps.width_offset
and height_offset
.wandb
) for logging intermediate outputs.Maintenance & Community
The original author notes a redirection of efforts to pyglide
and may be slow to address bugs. Recommendations are made to explore @crowsonkb
's v-diffusion-pytorch
.
Licensing & Compatibility
The repository itself does not explicitly state a license in the README. The underlying guided-diffusion
library by crowsonkb is typically distributed under permissive licenses like MIT, but this should be verified for the specific version used.
Limitations & Caveats
The project is in maintenance mode, with the author focusing on other projects. This may lead to slower bug fixes or feature development. The 64x64 checkpoint requires specific parameter tuning (clip_guidance_scale
, tv_scale
) due to a different noise scheduler.
3 years ago
1 day