stable-diffusion-pytorch  by kjsman

PyTorch SDK for Stable Diffusion

Created 2 years ago
590 stars

Top 55.1% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a minimal, self-contained PyTorch implementation of Stable Diffusion, targeting developers and researchers seeking a readable and hackable codebase for text-to-image generation. It offers core Stable Diffusion functionalities with a focus on clarity and ease of modification, enabling rapid experimentation with various generation parameters.

How It Works

The implementation is built around a simplified PyTorch architecture, directly referencing Stable Diffusion v1.x configurations. It prioritizes code readability and includes essential components for diffusion models, such as samplers and pipeline generation. The design aims to be hackable, allowing users to easily modify or extend functionalities, with loops unrolled for potential performance gains where shape allows.

Quick Start & Requirements

  • Install dependencies: pip install torch numpy Pillow regex or pip install -r requirements.txt.
  • Download data.v20221029.tar and unpack it into the parent directory of the cloned repository.
  • Requires PyTorch, NumPy, Pillow, regex, and tqdm.
  • Official documentation is available via docstrings within the stable_diffusion_pytorch.pipeline.generate function.

Highlighted Details

  • Supports text-to-image and image-to-image generation.
  • Allows customization of prompts, negative prompts, seeds, guidance scale, inference steps, and image dimensions.
  • Offers flexibility in sampler choice (k_lms, k_euler, k_euler_ancestral).
  • Provides options for managing model loading to CPU or GPU based on available VRAM.

Maintenance & Community

The project is maintained by kjsman. No specific community channels or roadmap details are provided in the README.

Licensing & Compatibility

All code is licensed under the MIT License. However, the included checkpoint files are subject to the CreativeML Open RAIL-M License, which includes use-based restrictions. Users must adhere to this license for checkpoint usage.

Limitations & Caveats

The README notes that configurations are hard-coded for Stable Diffusion v1.x. While aiming for clarity, the author humorously describes the codebase as potentially "spaghetti," suggesting a learning curve for deep modifications. The CreativeML Open RAIL-M License for checkpoints may impose restrictions on commercial or specific use cases.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

IP-Adapter by tencent-ailab

0.3%
6k
Adapter for image prompt in text-to-image diffusion models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.