aibrix  by vllm-project

Cloud-native infrastructure for scalable GenAI inference

created 1 year ago
3,963 stars

Top 12.6% on sourcepulse

GitHubView on GitHub
Project Summary

AIBrix provides cloud-native infrastructure components for scalable GenAI inference, targeting enterprises needing to deploy, manage, and scale LLMs. It offers cost-efficient, pluggable building blocks for high-density LoRA management, LLM gateway routing, and app-tailored autoscaling.

How It Works

AIBrix employs a unified AI runtime sidecar for metric standardization and model management, coupled with distributed inference capabilities. Its architecture supports distributed KV cache for high-capacity reuse and heterogeneous serving across mixed GPU configurations to reduce costs while maintaining SLO guarantees. GPU hardware failure detection is also integrated.

Quick Start & Requirements

  • Install: Clone the repository and use kubectl create -k commands for either nightly or stable releases (v0.2.1 mentioned).
  • Prerequisites: Kubernetes cluster, kubectl.
  • Documentation: https://aibrix.io/docs

Highlighted Details

  • High-Density LoRA Management
  • LLM Gateway and Routing
  • App-Tailored Autoscaler
  • Distributed KV Cache for high-capacity reuse
  • Cost-efficient Heterogeneous Serving (mixed GPU)

Maintenance & Community

  • Active development with recent releases (v0.2.1 on 2025-03-09).
  • Community support via Slack channel: #aibrix.
  • Contributing guidelines available.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

The project is described as an "initiative" and its quick start relies on Kubernetes, indicating a focus on orchestrated environments and potentially a steeper learning curve for users not familiar with Kubernetes.

Health Check
Last commit

20 hours ago

Responsiveness

1 day

Pull Requests (30d)
95
Issues (30d)
67
Star History
489 stars in the last 90 days

Explore Similar Projects

Starred by Eugene Yan Eugene Yan(AI Scientist at AWS), Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
3 more.

seldon-core by SeldonIO

0.1%
5k
MLOps framework for production model deployment on Kubernetes
created 7 years ago
updated 1 day ago
Feedback? Help us improve.