CLIP-Guided-Diffusion  by nerdyrodent

Local text-to-image diffusion using CLIP guidance

created 3 years ago
387 stars

Top 75.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a local implementation of CLIP-guided diffusion for text-to-image generation, enabling users to bypass cloud-based Colab notebooks. It targets researchers and hobbyists interested in AI art generation and offers a flexible way to experiment with diffusion models and CLIP for creative image synthesis.

How It Works

The project leverages OpenAI's guided diffusion models (available in 256x256 and 512x512 resolutions) and the CLIP model to connect text prompts with generated images. It allows for weighted and multiple text prompts, as well as image prompts, to guide the diffusion process. The approach offers fine-grained control over generation parameters like guidance scale, diffusion steps, and smoothness.

Quick Start & Requirements

  • Install: conda create --name cgd python=3.9, conda activate cgd, git clone ..., cd CLIP-Guided-Diffusion, ./setup.sh or manual commands.
  • Prerequisites: Ubuntu 20.04 (Windows untested), Anaconda, NVIDIA GPU (RTX 3090 recommended), CUDA 11.1.
  • VRAM: 10 GB for 256x256, 18 GB for 512x512.
  • Dependencies: PyTorch 1.9.0+cu111, CLIP, guided-diffusion, lpips, matplotlib.
  • Models: Download unconditional ImageNet diffusion models (256x256 and 512x512).
  • Docs: OpenAI CLIP, OpenAI guided-diffusion.

Highlighted Details

  • Supports multiple text prompts with adjustable weights.
  • Allows for image prompts and initial image seeding.
  • Capable of generating videos from diffusion steps, with optional upscaling via Real-ESRGAN.
  • Offers extensive command-line arguments for parameter tuning (e.g., clip_guidance_scale, tv_scale, diffusion_steps).

Maintenance & Community

  • Based on work by Katherine Crowson.
  • No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license for this repository. It references OpenAI's CLIP and guided-diffusion projects, which have their own licenses.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as "just playing," suggesting it may not be production-ready. Windows compatibility is untested. The setup requires specific older versions of PyTorch and CUDA, which might conflict with other environments.

Health Check
Last commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.