aibrix  by vllm-project

Cloud-native infrastructure for scalable GenAI inference

Created 1 year ago
4,715 stars

Top 10.4% on SourcePulse

GitHubView on GitHub
Project Summary

AIBrix provides cloud-native infrastructure components for scalable GenAI inference, targeting enterprises needing to deploy, manage, and scale LLMs. It offers cost-efficient, pluggable building blocks for high-density LoRA management, LLM gateway routing, and app-tailored autoscaling.

How It Works

AIBrix employs a unified AI runtime sidecar for metric standardization and model management, coupled with distributed inference capabilities. Its architecture supports distributed KV cache for high-capacity reuse and heterogeneous serving across mixed GPU configurations to reduce costs while maintaining SLO guarantees. GPU hardware failure detection is also integrated.

Quick Start & Requirements

  • Install: Clone the repository and use kubectl create -k commands for either nightly or stable releases (v0.2.1 mentioned).
  • Prerequisites: Kubernetes cluster, kubectl.
  • Documentation: https://aibrix.io/docs

Highlighted Details

  • High-Density LoRA Management
  • LLM Gateway and Routing
  • App-Tailored Autoscaler
  • Distributed KV Cache for high-capacity reuse
  • Cost-efficient Heterogeneous Serving (mixed GPU)

Maintenance & Community

  • Active development with recent releases (v0.2.1 on 2025-03-09).
  • Community support via Slack channel: #aibrix.
  • Contributing guidelines available.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

The project is described as an "initiative" and its quick start relies on Kubernetes, indicating a focus on orchestrated environments and potentially a steeper learning curve for users not familiar with Kubernetes.

Health Check
Last Commit

21 hours ago

Responsiveness

1 day

Pull Requests (30d)
59
Issues (30d)
44
Star History
64 stars in the last 30 days

Explore Similar Projects

Starred by Matthew Johnson Matthew Johnson(Coauthor of JAX; Research Scientist at Google Brain), Roy Frostig Roy Frostig(Coauthor of JAX; Research Scientist at Google DeepMind), and
3 more.

sglang-jax by sgl-project

1.5%
264
High-performance LLM inference engine for JAX/TPU serving
Created 8 months ago
Updated 1 day ago
Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
3 more.

llm-d by llm-d

1.7%
3k
Kubernetes-native framework for distributed LLM inference
Created 11 months ago
Updated 1 day ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

serve by pytorch

0%
4k
Serve, optimize, and scale PyTorch models in production
Created 6 years ago
Updated 8 months ago
Feedback? Help us improve.