clip-guided-diffusion  by afiaka87

CLI tool for text-to-image generation using CLIP-guided diffusion

Created 4 years ago
463 stars

Top 65.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a command-line interface (CLI) and Python module for generating images from text prompts using guided diffusion models and OpenAI's CLIP. It's designed for researchers and artists interested in exploring text-to-image synthesis with fine-grained control over the generation process. The tool allows for image blending, weighted prompts, and various diffusion schedulers, offering a flexible approach to creative AI.

How It Works

The core of the tool leverages Katherine Crowson's guided diffusion models, combined with CLIP for text-image alignment. It iteratively refines an image based on a text prompt, guiding the diffusion process towards a representation that matches the prompt's semantic meaning as interpreted by CLIP. Users can influence the generation by adjusting parameters like clip_guidance_scale, tv_scale (for smoothness), and timestep_respacing (to trade speed for accuracy). It also supports blending with an initial image using perceptual loss.

Quick Start & Requirements

  • Install: Clone the repository, then clone the guided-diffusion submodule, and install it:
    git clone https://github.com/afiaka87/clip-guided-diffusion.git
    cd clip-guided-diffusion
    git clone https://github.com/crowsonkb/guided-diffusion.git
    pip3 install -e guided-diffusion
    python3 setup.py install
    
  • Run: cgd --prompts "Alien friend by Odilon Redo"
  • Prerequisites: Python 3, CUDA-enabled GPU recommended (CPU is very slow). Checkpoints are downloaded automatically to ~/.cache/clip-guided-diffusion/.
  • Docs: Full Usage

Highlighted Details

  • Supports weighted prompts (e.g., "prompt1:1.0|prompt2:-0.5") for nuanced control.
  • Allows blending with an existing image (--init_image) and skipping initial timesteps.
  • Offers experimental support for non-square image generation via width_offset and height_offset.
  • Integrates with Weights & Biases (wandb) for logging intermediate outputs.

Maintenance & Community

The original author notes a redirection of efforts to pyglide and may be slow to address bugs. Recommendations are made to explore @crowsonkb's v-diffusion-pytorch.

Licensing & Compatibility

The repository itself does not explicitly state a license in the README. The underlying guided-diffusion library by crowsonkb is typically distributed under permissive licenses like MIT, but this should be verified for the specific version used.

Limitations & Caveats

The project is in maintenance mode, with the author focusing on other projects. This may lead to slower bug fixes or feature development. The 64x64 checkpoint requires specific parameter tuning (clip_guidance_scale, tv_scale) due to a different noise scheduler.

Health Check
Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Shengjia Zhao Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
7 more.

glide-text2im by openai

0.1%
4k
Text-conditional image synthesis model from research paper
Created 3 years ago
Updated 1 year ago
Starred by Deepak Pathak Deepak Pathak(Cofounder of Skild AI; Professor at CMU), Travis Fischer Travis Fischer(Founder of Agentic), and
8 more.

sygil-webui by Sygil-Dev

0.0%
8k
Web UI for Stable Diffusion
Created 3 years ago
Updated 2 months ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React; Coauthor of Redux, Create React App), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
57 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.