stable-diffusion-deploy  by Lightning-Universe

Production server for Stable Diffusion model deployment

Created 3 years ago
391 stars

Top 73.5% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a production-ready deployment blueprint for Stable Diffusion models, targeting developers and researchers who need to serve AI art generation at scale. It demonstrates a robust architecture for load-balancing, dynamic batching, and micro-services orchestration using the Lightning Apps framework, enabling efficient GPU inference and autoscaling.

How It Works

The system leverages the Lightning Apps framework to orchestrate multiple micro-services, including a frontend UI, a backend REST API for model inference, and a load balancer. It utilizes PyTorch for GPU-accelerated inference with dynamic batching to maximize throughput. A safety checker is included to filter NSFW content, with a fallback to a placeholder image. The architecture is designed for cloud deployment and autoscaling based on load.

Quick Start & Requirements

  • Install: Clone the repository and run bash dev_install.sh.
  • Run Locally: python -m lightning run app app.py
  • Run on Cloud: python -m lightning run app app.py --cloud
  • Prerequisites: Python 3.9, Conda. GPU is recommended for inference.
  • Docs: PyTorch Lightning Docs

Highlighted Details

  • Demonstrates a full React.js UI and micro-services orchestration.
  • Features dynamic GPU batching for inference requests.
  • Includes load balancing with autoscaling infrastructure.
  • Integrates with Slack via a Slack Command Bot Component.
  • Supports load testing with Locust.

Maintenance & Community

  • Project is part of the Lightning AI ecosystem.
  • Community support available via Slack.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The provided code snippet for inference does not explicitly handle potential out-of-memory errors on GPUs, and the non-GPU fallback is a simple sleep and random image generation, not a functional CPU inference. The Slack integration requires obtaining multiple API tokens and secrets from the Slack API.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
2 more.

gpustack by gpustack

1.3%
4k
GPU cluster manager for AI model deployment
Created 1 year ago
Updated 1 day ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
Created 6 years ago
Updated 1 month ago
Feedback? Help us improve.