gpustack  by gpustack

GPU cluster manager for AI model deployment

Created 1 year ago
3,728 stars

Top 13.0% on SourcePulse

GitHubView on GitHub
Project Summary

GPUStack is an open-source platform for deploying and serving AI models across diverse GPU hardware, targeting developers and researchers needing scalable inference solutions. It simplifies the process of running various AI models, including LLMs and diffusion models, by providing a unified interface and OpenAI-compatible APIs, enabling efficient distributed inference and resource management.

How It Works

GPUStack acts as a cluster manager, abstracting away hardware complexities and offering a consistent API layer. It supports multiple inference backends like vLLM, llama.cpp, and stable-diffusion.cpp, allowing users to leverage different model optimizations. The system is designed for scalability, enabling seamless addition of GPUs and nodes, and supports both single-node multi-GPU and multi-node distributed inference configurations.

Quick Start & Requirements

  • Installation: A single script handles installation as a service on Linux/macOS (curl -sfL https://get.gpustack.ai | sh -s -) or Windows (Invoke-Expression (Invoke-WebRequest -Uri "https://get.gpustack.ai" -UseBasicParsing).Content). Docker and manual installation options are available in the documentation.
  • Prerequisites: Python 3.10-3.12. Supports NVIDIA CUDA (Compute Capability 6.0+), Apple Metal, AMD ROCm, Ascend CANN, Hygon DTK, and Moore Threads MUSA.
  • Resources: Running stable-diffusion-v3-5-large-turbo requires ~12GB VRAM and disk space.
  • Docs: Official Docs

Highlighted Details

  • Supports a wide array of AI models: LLMs (LLaMA, Mistral), VLMs, Diffusion Models (Stable Diffusion), Embedding Models, and Audio Models (Whisper).
  • Provides OpenAI-compatible APIs for common tasks like chat completions, embeddings, and image generation.
  • Features user and API key management, GPU metrics monitoring, and token usage tracking.
  • Offers broad hardware compatibility, including Apple Silicon Macs, Windows PCs, and Linux servers.

Maintenance & Community

GPUStack is licensed under the Apache License 2.0. Community support is available via their community channels (links not provided in README).

Licensing & Compatibility

Licensed under the Apache License, Version 2.0. This permissive license allows for commercial use and integration with closed-source applications.

Limitations & Caveats

Future accelerator support is planned for Intel oneAPI and Qualcomm AI Engine. The README mentions a "default password" which requires retrieval via a file read, potentially posing a minor security consideration for initial setup if not handled carefully.

Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
33
Issues (30d)
115
Star History
373 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
3 more.

LitServe by Lightning-AI

0.3%
4k
AI inference pipeline framework
Created 1 year ago
Updated 1 day ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
Created 6 years ago
Updated 1 month ago
Feedback? Help us improve.