gpustack by gpustack

GPU cluster manager for AI model deployment

Created 1 year ago

4,359 stars

Top 11.1% on SourcePulse

View on GitHub

4 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Stas Bekman

Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake

Travis Fischer

Founder of Agentic

Georgi Gerganov

Author of llama.cpp, whisper.cpp

Project Summary

GPUStack is an open-source platform for deploying and serving AI models across diverse GPU hardware, targeting developers and researchers needing scalable inference solutions. It simplifies the process of running various AI models, including LLMs and diffusion models, by providing a unified interface and OpenAI-compatible APIs, enabling efficient distributed inference and resource management.

How It Works

GPUStack acts as a cluster manager, abstracting away hardware complexities and offering a consistent API layer. It supports multiple inference backends like vLLM, llama.cpp, and stable-diffusion.cpp, allowing users to leverage different model optimizations. The system is designed for scalability, enabling seamless addition of GPUs and nodes, and supports both single-node multi-GPU and multi-node distributed inference configurations.

Quick Start & Requirements

Installation: A single script handles installation as a service on Linux/macOS (curl -sfL https://get.gpustack.ai | sh -s -) or Windows (Invoke-Expression (Invoke-WebRequest -Uri "https://get.gpustack.ai" -UseBasicParsing).Content). Docker and manual installation options are available in the documentation.
Prerequisites: Python 3.10-3.12. Supports NVIDIA CUDA (Compute Capability 6.0+), Apple Metal, AMD ROCm, Ascend CANN, Hygon DTK, and Moore Threads MUSA.
Resources: Running stable-diffusion-v3-5-large-turbo requires ~12GB VRAM and disk space.
Docs: Official Docs

Highlighted Details

Supports a wide array of AI models: LLMs (LLaMA, Mistral), VLMs, Diffusion Models (Stable Diffusion), Embedding Models, and Audio Models (Whisper).
Provides OpenAI-compatible APIs for common tasks like chat completions, embeddings, and image generation.
Features user and API key management, GPU metrics monitoring, and token usage tracking.
Offers broad hardware compatibility, including Apple Silicon Macs, Windows PCs, and Linux servers.

Maintenance & Community

GPUStack is licensed under the Apache License 2.0. Community support is available via their community channels (links not provided in README).

Licensing & Compatibility

Licensed under the Apache License, Version 2.0. This permissive license allows for commercial use and integration with closed-source applications.

Limitations & Caveats

Future accelerator support is planned for Intel oneAPI and Qualcomm AI Engine. The README mentions a "default password" which requires retrieval via a file read, potentially posing a minor security consideration for initial setup if not handled carefully.

Health Check

Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)

138

Issues (30d)

322

Star History

176 stars in the last 30 days