gpustack  by gpustack

GPU cluster manager for AI model deployment

created 1 year ago
3,177 stars

Top 15.5% on sourcepulse

GitHubView on GitHub
Project Summary

GPUStack is an open-source platform for deploying and serving AI models across diverse GPU hardware, targeting developers and researchers needing scalable inference solutions. It simplifies the process of running various AI models, including LLMs and diffusion models, by providing a unified interface and OpenAI-compatible APIs, enabling efficient distributed inference and resource management.

How It Works

GPUStack acts as a cluster manager, abstracting away hardware complexities and offering a consistent API layer. It supports multiple inference backends like vLLM, llama.cpp, and stable-diffusion.cpp, allowing users to leverage different model optimizations. The system is designed for scalability, enabling seamless addition of GPUs and nodes, and supports both single-node multi-GPU and multi-node distributed inference configurations.

Quick Start & Requirements

  • Installation: A single script handles installation as a service on Linux/macOS (curl -sfL https://get.gpustack.ai | sh -s -) or Windows (Invoke-Expression (Invoke-WebRequest -Uri "https://get.gpustack.ai" -UseBasicParsing).Content). Docker and manual installation options are available in the documentation.
  • Prerequisites: Python 3.10-3.12. Supports NVIDIA CUDA (Compute Capability 6.0+), Apple Metal, AMD ROCm, Ascend CANN, Hygon DTK, and Moore Threads MUSA.
  • Resources: Running stable-diffusion-v3-5-large-turbo requires ~12GB VRAM and disk space.
  • Docs: Official Docs

Highlighted Details

  • Supports a wide array of AI models: LLMs (LLaMA, Mistral), VLMs, Diffusion Models (Stable Diffusion), Embedding Models, and Audio Models (Whisper).
  • Provides OpenAI-compatible APIs for common tasks like chat completions, embeddings, and image generation.
  • Features user and API key management, GPU metrics monitoring, and token usage tracking.
  • Offers broad hardware compatibility, including Apple Silicon Macs, Windows PCs, and Linux servers.

Maintenance & Community

GPUStack is licensed under the Apache License 2.0. Community support is available via their community channels (links not provided in README).

Licensing & Compatibility

Licensed under the Apache License, Version 2.0. This permissive license allows for commercial use and integration with closed-source applications.

Limitations & Caveats

Future accelerator support is planned for Intel oneAPI and Qualcomm AI Engine. The README mentions a "default password" which requires retrieval via a file read, potentially posing a minor security consideration for initial setup if not handled carefully.

Health Check
Last commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
116
Issues (30d)
220
Star History
596 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
created 5 years ago
updated 3 weeks ago
Starred by Anton Bukov Anton Bukov(Cofounder of 1inch Network), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
9 more.

exo by exo-explore

0.3%
29k
AI cluster for running models on diverse devices
created 1 year ago
updated 4 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 9 hours ago
Feedback? Help us improve.