pytorch-stable-diffusion  by hkproj

PyTorch code for Stable Diffusion image generation

Created 2 years ago
962 stars

Top 38.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of Stable Diffusion, a text-to-image generation model. It is designed for researchers and developers who want to understand and experiment with the core components of Stable Diffusion without relying on higher-level libraries. The project allows users to generate images from text prompts using pre-trained Stable Diffusion models.

How It Works

The implementation is built from scratch in PyTorch, offering a direct look at the model's architecture and forward pass. It leverages pre-trained weights and tokenizer files, specifically mentioning compatibility with Stable Diffusion v1.5 and various fine-tuned models. The core functionality involves loading these components and processing text prompts to generate corresponding images.

Quick Start & Requirements

  • Install: Clone the repository and download necessary model weights (v1-5-pruned-emaonly.ckpt) and tokenizer files (vocab.json, merges.txt) into a data folder.
  • Prerequisites: PyTorch, Python. Specific versions are not stated but implied to be compatible with standard PyTorch installations.
  • Resources: Requires downloading model checkpoints, which can be several gigabytes.

Highlighted Details

  • Implemented from scratch in PyTorch for educational and research purposes.
  • Supports Stable Diffusion v1.5 and various fine-tuned models (e.g., InkPunk Diffusion, Illustration Diffusion).
  • Provides direct access to model components for deeper understanding.

Maintenance & Community

The project acknowledges several other Stable Diffusion implementations, including those from CompVis, divamgupta, kjsman, and Hugging Face's diffusers library, suggesting community awareness and potential for integration or comparison. No specific community channels or active maintenance signals are present in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking is therefore undetermined.

Limitations & Caveats

This implementation is a foundational PyTorch port and may lack the optimizations, features, or user-friendly abstractions found in more mature libraries like Hugging Face's diffusers. The README does not detail performance benchmarks or specific hardware requirements beyond general PyTorch compatibility.

Health Check
Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 30 days

Explore Similar Projects

Starred by Jiaming Song Jiaming Song(Chief Scientist at Luma AI) and Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI).

oft by zqiu24

0.3%
294
Research paper on orthogonal finetuning for text-to-image diffusion models
Created 2 years ago
Updated 2 weeks ago
Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
1 more.

cycle-diffusion by ChenWu98

0%
640
PyTorch code for diffusion model latent space research paper
Created 2 years ago
Updated 1 year ago
Starred by Robin Rombach Robin Rombach(Cofounder of Black Forest Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.