glid-3-xl by Jack000

Latent diffusion model for image generation and editing

Created 3 years ago

265 stars

Top 96.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Robin Rombach

Cofounder of Black Forest Labs

Project Summary

This repository provides GLID-3-XL, a 1.4 billion parameter latent diffusion model, back-ported to the guided diffusion codebase. It enables users to fine-tune diffusion models for tasks like image generation, inpainting, and super-resolution, offering a powerful tool for creative AI applications.

How It Works

GLID-3-XL leverages a latent diffusion model architecture, operating in a compressed latent space for efficiency. It utilizes a text encoder (BERT) for prompt understanding and a diffusion model for iterative image generation. The model is split into multiple checkpoints, allowing for modularity and targeted fine-tuning on custom datasets or specific tasks like inpainting.

Quick Start & Requirements

Install: Clone the repository, install latent diffusion (pip install latent-diffusion), then install this project (pip install -e .).
Model Files: Download checkpoints for the text encoder (bert.pt), LDM first stage (kl-f8.pt), and diffusion models (diffusion.pt, finetune.pt, inpaint.pt).
Prerequisites: Python, PyTorch, and potentially PyQt5 for GUI-based inpainting. GPU acceleration is highly recommended for generation.
Generation: Use python sample.py with various arguments for text-to-image generation, CLIP guidance, or image-to-image editing.
Docs: https://github.com/Jack000/glid-3-xl

Highlighted Details

Supports text-to-image generation, inpainting, and uncropping.
Offers options for classifier-free guidance and CLIP guidance for improved prompt adherence.
Includes an autoedit.py script for continuous image editing based on CLIP score maximization.
Fine-tuning scripts are provided for custom dataset training, though 24GB VRAM is insufficient for current training configurations.

Maintenance & Community

The project appears to be a personal or research-oriented release by Jack000, with no explicit mention of a broader community, ongoing maintenance, or partnerships.

Licensing & Compatibility

The repository does not explicitly state a license. The underlying latent diffusion codebase may have its own licensing terms. Users should verify compatibility for commercial or closed-source use.

Limitations & Caveats

Training requires more than 24GB of VRAM. The inpainting training is marked as "wip" (work in progress). The project does not specify compatibility with newer PyTorch versions or other diffusion libraries.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days