dynamo  by ai-dynamo

Inference framework for distributed generative AI model serving

Created 6 months ago
5,003 stars

Top 9.9% on SourcePulse

GitHubView on GitHub
Project Summary

NVIDIA Dynamo is a distributed inference serving framework designed for high-throughput, low-latency serving of generative AI and reasoning models across multiple nodes. It targets users needing to deploy LLMs at scale, offering features like disaggregated prefill/decode, dynamic GPU scheduling, and KV cache offloading to optimize performance and resource utilization.

How It Works

Dynamo employs a disaggregated architecture, separating prefill and decode stages to maximize GPU utilization and allow flexible throughput/latency trade-offs. It features LLM-aware request routing for efficient KV cache management and dynamic GPU scheduling to adapt to fluctuating workloads. Built with Rust for performance and Python for extensibility, it supports multiple inference backends (TRT-LLM, vLLM, SGLang) and utilizes NIXL for accelerated data transfer.

Quick Start & Requirements

  • Install: pip install ai-dynamo[all]
  • Prerequisites: Ubuntu 24.04 (recommended), python3-dev, python3-pip, python3-venv, libucx0. CUDA and specific inference backends may require additional setup.
  • Resources: Building the base Docker image is required for Kubernetes deployment. Local testing involves docker compose.
  • Docs: Roadmap, Support Matrix, Guides

Highlighted Details

  • Supports multiple inference backends (TRT-LLM, vLLM, SGLang).
  • Disaggregated prefill and decode for throughput/latency optimization.
  • Dynamic GPU scheduling and LLM-aware request routing.
  • KV cache offloading for enhanced system throughput.
  • OpenAI-compatible frontend.

Maintenance & Community

  • Open-source first development approach.
  • Community support channels are not explicitly mentioned in the README.

Licensing & Compatibility

  • The README does not explicitly state the license. It mentions "fully open-source" and "OSS (Open Source Software) first development approach."

Limitations & Caveats

The README recommends Ubuntu 24.04, suggesting potential compatibility issues on other operating systems. Building custom Docker images is necessary for Kubernetes deployments. Specific backend compatibility details are linked but require further investigation.

Health Check
Last Commit

15 hours ago

Responsiveness

1 day

Pull Requests (30d)
583
Issues (30d)
91
Star History
262 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Johannes Hagemann Johannes Hagemann(Cofounder of Prime Intellect), and
3 more.

minions by HazyResearch

1.3%
1k
Communication protocol for cost-efficient LLM collaboration
Created 7 months ago
Updated 18 hours ago
Feedback? Help us improve.