Latent diffusion model for image generation and editing
Top 97.2% on sourcepulse
This repository provides GLID-3-XL, a 1.4 billion parameter latent diffusion model, back-ported to the guided diffusion codebase. It enables users to fine-tune diffusion models for tasks like image generation, inpainting, and super-resolution, offering a powerful tool for creative AI applications.
How It Works
GLID-3-XL leverages a latent diffusion model architecture, operating in a compressed latent space for efficiency. It utilizes a text encoder (BERT) for prompt understanding and a diffusion model for iterative image generation. The model is split into multiple checkpoints, allowing for modularity and targeted fine-tuning on custom datasets or specific tasks like inpainting.
Quick Start & Requirements
pip install latent-diffusion
), then install this project (pip install -e .
).bert.pt
), LDM first stage (kl-f8.pt
), and diffusion models (diffusion.pt
, finetune.pt
, inpaint.pt
).python sample.py
with various arguments for text-to-image generation, CLIP guidance, or image-to-image editing.Highlighted Details
autoedit.py
script for continuous image editing based on CLIP score maximization.Maintenance & Community
The project appears to be a personal or research-oriented release by Jack000, with no explicit mention of a broader community, ongoing maintenance, or partnerships.
Licensing & Compatibility
The repository does not explicitly state a license. The underlying latent diffusion codebase may have its own licensing terms. Users should verify compatibility for commercial or closed-source use.
Limitations & Caveats
Training requires more than 24GB of VRAM. The inpainting training is marked as "wip" (work in progress). The project does not specify compatibility with newer PyTorch versions or other diffusion libraries.
3 years ago
1 day