aibrix by vllm-project

Cloud-native infrastructure for scalable GenAI inference

Created 1 year ago

4,517 stars

Top 10.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Chaoyu Yang

Founder of Bento

Project Summary

AIBrix provides cloud-native infrastructure components for scalable GenAI inference, targeting enterprises needing to deploy, manage, and scale LLMs. It offers cost-efficient, pluggable building blocks for high-density LoRA management, LLM gateway routing, and app-tailored autoscaling.

How It Works

AIBrix employs a unified AI runtime sidecar for metric standardization and model management, coupled with distributed inference capabilities. Its architecture supports distributed KV cache for high-capacity reuse and heterogeneous serving across mixed GPU configurations to reduce costs while maintaining SLO guarantees. GPU hardware failure detection is also integrated.

Quick Start & Requirements

Install: Clone the repository and use kubectl create -k commands for either nightly or stable releases (v0.2.1 mentioned).
Prerequisites: Kubernetes cluster, kubectl.
Documentation: https://aibrix.io/docs

Highlighted Details

High-Density LoRA Management
LLM Gateway and Routing
App-Tailored Autoscaler
Distributed KV Cache for high-capacity reuse
Cost-efficient Heterogeneous Serving (mixed GPU)

Maintenance & Community

Active development with recent releases (v0.2.1 on 2025-03-09).
Community support via Slack channel: #aibrix.
Contributing guidelines available.

Licensing & Compatibility

Licensed under Apache 2.0.
Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

The project is described as an "initiative" and its quick start relies on Kubernetes, indicating a focus on orchestrated environments and potentially a steeper learning curve for users not familiar with Kubernetes.

aibrix by vllm-project

Explore Similar Projects

MoE-Infinity by EfficientMoE

candle-vllm by EricLBuehler

kubeai by kubeai-project

kaito by kaito-project

cake by evilsocket

multi-model-server by awslabs

model_server by openvinotoolkit

llm-d by llm-d

dynamo by ai-dynamo

serve by pytorch

kserve by kserve

serving by tensorflow