stable-diffusion-deploy by Lightning-Universe

Production server for Stable Diffusion model deployment

Created 3 years ago

391 stars

Top 73.5% on SourcePulse

Project Summary

This project provides a production-ready deployment blueprint for Stable Diffusion models, targeting developers and researchers who need to serve AI art generation at scale. It demonstrates a robust architecture for load-balancing, dynamic batching, and micro-services orchestration using the Lightning Apps framework, enabling efficient GPU inference and autoscaling.

How It Works

The system leverages the Lightning Apps framework to orchestrate multiple micro-services, including a frontend UI, a backend REST API for model inference, and a load balancer. It utilizes PyTorch for GPU-accelerated inference with dynamic batching to maximize throughput. A safety checker is included to filter NSFW content, with a fallback to a placeholder image. The architecture is designed for cloud deployment and autoscaling based on load.

Quick Start & Requirements

Install: Clone the repository and run bash dev_install.sh.
Run Locally: python -m lightning run app app.py
Run on Cloud: python -m lightning run app app.py --cloud
Prerequisites: Python 3.9, Conda. GPU is recommended for inference.
Docs: PyTorch Lightning Docs

Highlighted Details

Demonstrates a full React.js UI and micro-services orchestration.
Features dynamic GPU batching for inference requests.
Includes load balancing with autoscaling infrastructure.
Integrates with Slack via a Slack Command Bot Component.
Supports load testing with Locust.

Maintenance & Community

Project is part of the Lightning AI ecosystem.
Community support available via Slack.

Licensing & Compatibility

Licensed under Apache 2.0.
Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The provided code snippet for inference does not explicitly handle potential out-of-memory errors on GPUs, and the non-GPU fallback is a simple sleep and random image generation, not a functional CPU inference. The Slack integration requires obtaining multiple API tokens and secrets from the Slack API.

stable-diffusion-deploy by Lightning-Universe

Explore Similar Projects

vllm-playground by micytao

stable-diffusion-multi-user by wolverinn

mosec by mosecorg

z.ai2api_python by ZyphrZero

dstack by dstackai

llama_deploy by run-llama

ai-on-gke by GoogleCloudPlatform

clearml by clearml

serve by pytorch

skypilot by skypilot-org

metaflow by Netflix

serve by jina-ai