aibrix  by vllm-project

Cloud-native infrastructure for scalable GenAI inference

Created 1 year ago
4,243 stars

Top 11.5% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

AIBrix provides cloud-native infrastructure components for scalable GenAI inference, targeting enterprises needing to deploy, manage, and scale LLMs. It offers cost-efficient, pluggable building blocks for high-density LoRA management, LLM gateway routing, and app-tailored autoscaling.

How It Works

AIBrix employs a unified AI runtime sidecar for metric standardization and model management, coupled with distributed inference capabilities. Its architecture supports distributed KV cache for high-capacity reuse and heterogeneous serving across mixed GPU configurations to reduce costs while maintaining SLO guarantees. GPU hardware failure detection is also integrated.

Quick Start & Requirements

  • Install: Clone the repository and use kubectl create -k commands for either nightly or stable releases (v0.2.1 mentioned).
  • Prerequisites: Kubernetes cluster, kubectl.
  • Documentation: https://aibrix.io/docs

Highlighted Details

  • High-Density LoRA Management
  • LLM Gateway and Routing
  • App-Tailored Autoscaler
  • Distributed KV Cache for high-capacity reuse
  • Cost-efficient Heterogeneous Serving (mixed GPU)

Maintenance & Community

  • Active development with recent releases (v0.2.1 on 2025-03-09).
  • Community support via Slack channel: #aibrix.
  • Contributing guidelines available.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

The project is described as an "initiative" and its quick start relies on Kubernetes, indicating a focus on orchestrated environments and potentially a steeper learning curve for users not familiar with Kubernetes.

Health Check
Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
73
Issues (30d)
44
Star History
203 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Johannes Hagemann Johannes Hagemann(Cofounder of Prime Intellect), and
3 more.

minions by HazyResearch

1.3%
1k
Communication protocol for cost-efficient LLM collaboration
Created 7 months ago
Updated 18 hours ago
Starred by Carol Willing Carol Willing(Core Contributor to CPython, Jupyter), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
9 more.

dynamo by ai-dynamo

1.0%
5k
Inference framework for distributed generative AI model serving
Created 6 months ago
Updated 15 hours ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
Created 6 years ago
Updated 1 month ago
Feedback? Help us improve.