StableDiffusion-PyTorch by explainingai-code

Generative image synthesis via PyTorch Stable Diffusion implementation

Created 2 years ago

254 stars

Top 99.1% on SourcePulse

Project Summary

Summary

This repository provides a PyTorch implementation of Stable Diffusion, offering code for training and inference of Latent Diffusion Models (LDMs). It targets researchers and developers seeking a flexible framework to experiment with unconditional and conditional generative models, enabling custom model development and exploration of various conditioning techniques.

How It Works

The project implements Latent Diffusion Models built upon a VQVAE autoencoder and a DDPM (Denoising Diffusion Probabilistic Models) component with a linear schedule. It supports diverse conditioning mechanisms, including class labels, text embeddings (via CLIP or BERT), and semantic masks, allowing for tailored image generation. This modular design facilitates experimentation with different model architectures and conditioning strategies.

Quick Start & Requirements

Setup involves creating a Python 3.8 conda environment, cloning the repository, and installing dependencies via pip install -r requirements.txt. Users must manually download lpips weights (vgg.pth) and place them in models/weights/v0.1/vgg.pth. Datasets (MNIST or CelebHQ) must be prepared according to specified directory structures. Training LDM typically requires substantial GPU resources, though CPU training is noted as feasible for small autoencoders on MNIST.

Highlighted Details

Implements training and inference for unconditional and class-conditional LDMs on MNIST.
Supports unconditional, text-conditional (CLIP/BERT), and semantic mask-conditional LDMs on CelebHQ.
Offers flexibility in autoencoder choice (VAE or VQVAE, with VQVAE being the primary implementation).
Configuration is managed through YAML files (config/mnist.yaml, config/celebhq.yaml, etc.), allowing customization of training parameters and model components.

Maintenance & Community

The provided README does not contain information regarding maintainers, community channels (e.g., Discord, Slack), or a project roadmap.

Licensing & Compatibility

The repository's README does not specify a software license. This absence creates ambiguity regarding usage rights, redistribution, and compatibility with closed-source projects.

Limitations & Caveats

Some sample outputs in the README are indicated as not fully converged. The lack of an explicit software license is a significant caveat for adoption, particularly for commercial applications. Data preparation requires manual effort and adherence to strict directory structures.

StableDiffusion-PyTorch by explainingai-code

Explore Similar Projects

FreeDoM by yujiwen

awesome-flow-matching by dongzhuoyao

Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch by energy-based-model

Universal-Guided-Diffusion by arpitbansal297

SRPO by Tencent-Hunyuan

FastGen by NVlabs

BLIP3o by JiuhaiChen

v-diffusion-pytorch by crowsonkb

stable-diffusion by pesser

improved-diffusion by openai

guided-diffusion by openai

ControlNet by lllyasviel