stable-diffusion-pytorch  by kjsman

PyTorch SDK for Stable Diffusion

created 2 years ago
584 stars

Top 56.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a minimal, self-contained PyTorch implementation of Stable Diffusion, targeting developers and researchers seeking a readable and hackable codebase for text-to-image generation. It offers core Stable Diffusion functionalities with a focus on clarity and ease of modification, enabling rapid experimentation with various generation parameters.

How It Works

The implementation is built around a simplified PyTorch architecture, directly referencing Stable Diffusion v1.x configurations. It prioritizes code readability and includes essential components for diffusion models, such as samplers and pipeline generation. The design aims to be hackable, allowing users to easily modify or extend functionalities, with loops unrolled for potential performance gains where shape allows.

Quick Start & Requirements

  • Install dependencies: pip install torch numpy Pillow regex or pip install -r requirements.txt.
  • Download data.v20221029.tar and unpack it into the parent directory of the cloned repository.
  • Requires PyTorch, NumPy, Pillow, regex, and tqdm.
  • Official documentation is available via docstrings within the stable_diffusion_pytorch.pipeline.generate function.

Highlighted Details

  • Supports text-to-image and image-to-image generation.
  • Allows customization of prompts, negative prompts, seeds, guidance scale, inference steps, and image dimensions.
  • Offers flexibility in sampler choice (k_lms, k_euler, k_euler_ancestral).
  • Provides options for managing model loading to CPU or GPU based on available VRAM.

Maintenance & Community

The project is maintained by kjsman. No specific community channels or roadmap details are provided in the README.

Licensing & Compatibility

All code is licensed under the MIT License. However, the included checkpoint files are subject to the CreativeML Open RAIL-M License, which includes use-based restrictions. Users must adhere to this license for checkpoint usage.

Limitations & Caveats

The README notes that configurations are hard-coded for Stable Diffusion v1.x. While aiming for clarity, the author humorously describes the codebase as potentially "spaghetti," suggesting a learning curve for deep modifications. The CreativeML Open RAIL-M License for checkpoints may impose restrictions on commercial or specific use cases.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
14 stars in the last 90 days

Explore Similar Projects

Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.