stable-diffusion-deploy  by Lightning-Universe

Production server for Stable Diffusion model deployment

created 2 years ago
391 stars

Top 74.6% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a production-ready deployment blueprint for Stable Diffusion models, targeting developers and researchers who need to serve AI art generation at scale. It demonstrates a robust architecture for load-balancing, dynamic batching, and micro-services orchestration using the Lightning Apps framework, enabling efficient GPU inference and autoscaling.

How It Works

The system leverages the Lightning Apps framework to orchestrate multiple micro-services, including a frontend UI, a backend REST API for model inference, and a load balancer. It utilizes PyTorch for GPU-accelerated inference with dynamic batching to maximize throughput. A safety checker is included to filter NSFW content, with a fallback to a placeholder image. The architecture is designed for cloud deployment and autoscaling based on load.

Quick Start & Requirements

  • Install: Clone the repository and run bash dev_install.sh.
  • Run Locally: python -m lightning run app app.py
  • Run on Cloud: python -m lightning run app app.py --cloud
  • Prerequisites: Python 3.9, Conda. GPU is recommended for inference.
  • Docs: PyTorch Lightning Docs

Highlighted Details

  • Demonstrates a full React.js UI and micro-services orchestration.
  • Features dynamic GPU batching for inference requests.
  • Includes load balancing with autoscaling infrastructure.
  • Integrates with Slack via a Slack Command Bot Component.
  • Supports load testing with Locust.

Maintenance & Community

  • Project is part of the Lightning AI ecosystem.
  • Community support available via Slack.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The provided code snippet for inference does not explicitly handle potential out-of-memory errors on GPUs, and the non-GPU fallback is a simple sleep and random image generation, not a functional CPU inference. The Slack integration requires obtaining multiple API tokens and secrets from the Slack API.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Luca Antiga Luca Antiga(CTO of Lightning AI), and
1 more.

LitServe by Lightning-AI

1.4%
3k
AI inference pipeline framework
created 1 year ago
updated 3 days ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
2 more.

gpustack by gpustack

1.5%
3k
GPU cluster manager for AI model deployment
created 1 year ago
updated 3 days ago
Starred by Carol Willing Carol Willing(Core Contributor to CPython, Jupyter), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
4 more.

dynamo by ai-dynamo

1.2%
5k
Inference framework for distributed generative AI model serving
created 5 months ago
updated 11 hours ago
Starred by Anton Bukov Anton Bukov(Cofounder of 1inch Network), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
9 more.

exo by exo-explore

0.3%
29k
AI cluster for running models on diverse devices
created 1 year ago
updated 4 months ago
Feedback? Help us improve.