glid-3-xl  by Jack000

Latent diffusion model for image generation and editing

created 3 years ago
265 stars

Top 97.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides GLID-3-XL, a 1.4 billion parameter latent diffusion model, back-ported to the guided diffusion codebase. It enables users to fine-tune diffusion models for tasks like image generation, inpainting, and super-resolution, offering a powerful tool for creative AI applications.

How It Works

GLID-3-XL leverages a latent diffusion model architecture, operating in a compressed latent space for efficiency. It utilizes a text encoder (BERT) for prompt understanding and a diffusion model for iterative image generation. The model is split into multiple checkpoints, allowing for modularity and targeted fine-tuning on custom datasets or specific tasks like inpainting.

Quick Start & Requirements

  • Install: Clone the repository, install latent diffusion (pip install latent-diffusion), then install this project (pip install -e .).
  • Model Files: Download checkpoints for the text encoder (bert.pt), LDM first stage (kl-f8.pt), and diffusion models (diffusion.pt, finetune.pt, inpaint.pt).
  • Prerequisites: Python, PyTorch, and potentially PyQt5 for GUI-based inpainting. GPU acceleration is highly recommended for generation.
  • Generation: Use python sample.py with various arguments for text-to-image generation, CLIP guidance, or image-to-image editing.
  • Docs: https://github.com/Jack000/glid-3-xl

Highlighted Details

  • Supports text-to-image generation, inpainting, and uncropping.
  • Offers options for classifier-free guidance and CLIP guidance for improved prompt adherence.
  • Includes an autoedit.py script for continuous image editing based on CLIP score maximization.
  • Fine-tuning scripts are provided for custom dataset training, though 24GB VRAM is insufficient for current training configurations.

Maintenance & Community

The project appears to be a personal or research-oriented release by Jack000, with no explicit mention of a broader community, ongoing maintenance, or partnerships.

Licensing & Compatibility

The repository does not explicitly state a license. The underlying latent diffusion codebase may have its own licensing terms. Users should verify compatibility for commercial or closed-source use.

Limitations & Caveats

Training requires more than 24GB of VRAM. The inpainting training is marked as "wip" (work in progress). The project does not specify compatibility with newer PyTorch versions or other diffusion libraries.

Health Check
Last commit

3 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Travis Fischer Travis Fischer(Founder of Agentic), and
3 more.

consistency_models by openai

0.0%
6k
PyTorch code for consistency models research paper
created 2 years ago
updated 1 year ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.