CLIP-Guided-Diffusion  by nerdyrodent

Local text-to-image diffusion using CLIP guidance

Created 4 years ago
386 stars

Top 74.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a local implementation of CLIP-guided diffusion for text-to-image generation, enabling users to bypass cloud-based Colab notebooks. It targets researchers and hobbyists interested in AI art generation and offers a flexible way to experiment with diffusion models and CLIP for creative image synthesis.

How It Works

The project leverages OpenAI's guided diffusion models (available in 256x256 and 512x512 resolutions) and the CLIP model to connect text prompts with generated images. It allows for weighted and multiple text prompts, as well as image prompts, to guide the diffusion process. The approach offers fine-grained control over generation parameters like guidance scale, diffusion steps, and smoothness.

Quick Start & Requirements

  • Install: conda create --name cgd python=3.9, conda activate cgd, git clone ..., cd CLIP-Guided-Diffusion, ./setup.sh or manual commands.
  • Prerequisites: Ubuntu 20.04 (Windows untested), Anaconda, NVIDIA GPU (RTX 3090 recommended), CUDA 11.1.
  • VRAM: 10 GB for 256x256, 18 GB for 512x512.
  • Dependencies: PyTorch 1.9.0+cu111, CLIP, guided-diffusion, lpips, matplotlib.
  • Models: Download unconditional ImageNet diffusion models (256x256 and 512x512).
  • Docs: OpenAI CLIP, OpenAI guided-diffusion.

Highlighted Details

  • Supports multiple text prompts with adjustable weights.
  • Allows for image prompts and initial image seeding.
  • Capable of generating videos from diffusion steps, with optional upscaling via Real-ESRGAN.
  • Offers extensive command-line arguments for parameter tuning (e.g., clip_guidance_scale, tv_scale, diffusion_steps).

Maintenance & Community

  • Based on work by Katherine Crowson.
  • No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license for this repository. It references OpenAI's CLIP and guided-diffusion projects, which have their own licenses.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as "just playing," suggesting it may not be production-ready. Windows compatibility is untested. The setup requires specific older versions of PyTorch and CUDA, which might conflict with other environments.

Health Check
Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Robin Rombach Robin Rombach(Cofounder of Black Forest Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
Created 2 years ago
Updated 1 year ago
Starred by Deepak Pathak Deepak Pathak(Cofounder of Skild AI; Professor at CMU), Travis Fischer Travis Fischer(Founder of Agentic), and
8 more.

sygil-webui by Sygil-Dev

0.0%
8k
Web UI for Stable Diffusion
Created 3 years ago
Updated 2 months ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React; Coauthor of Redux, Create React App), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
57 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.