CLIP-Guided-Diffusion by nerdyrodent

Local text-to-image diffusion using CLIP guidance

Created 4 years ago

385 stars

Top 74.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Luis Capelo

Cofounder of Lightning AI

Project Summary

This repository provides a local implementation of CLIP-guided diffusion for text-to-image generation, enabling users to bypass cloud-based Colab notebooks. It targets researchers and hobbyists interested in AI art generation and offers a flexible way to experiment with diffusion models and CLIP for creative image synthesis.

How It Works

The project leverages OpenAI's guided diffusion models (available in 256x256 and 512x512 resolutions) and the CLIP model to connect text prompts with generated images. It allows for weighted and multiple text prompts, as well as image prompts, to guide the diffusion process. The approach offers fine-grained control over generation parameters like guidance scale, diffusion steps, and smoothness.

Quick Start & Requirements

Install: conda create --name cgd python=3.9, conda activate cgd, git clone ..., cd CLIP-Guided-Diffusion, ./setup.sh or manual commands.
Prerequisites: Ubuntu 20.04 (Windows untested), Anaconda, NVIDIA GPU (RTX 3090 recommended), CUDA 11.1.
VRAM: 10 GB for 256x256, 18 GB for 512x512.
Dependencies: PyTorch 1.9.0+cu111, CLIP, guided-diffusion, lpips, matplotlib.
Models: Download unconditional ImageNet diffusion models (256x256 and 512x512).
Docs: OpenAI CLIP, OpenAI guided-diffusion.

Highlighted Details

Supports multiple text prompts with adjustable weights.
Allows for image prompts and initial image seeding.
Capable of generating videos from diffusion steps, with optional upscaling via Real-ESRGAN.
Offers extensive command-line arguments for parameter tuning (e.g., clip_guidance_scale, tv_scale, diffusion_steps).

Maintenance & Community

Based on work by Katherine Crowson.
No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license for this repository. It references OpenAI's CLIP and guided-diffusion projects, which have their own licenses.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as "just playing," suggesting it may not be production-ready. Windows compatibility is untested. The setup requires specific older versions of PyTorch and CUDA, which might conflict with other environments.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days