vllm-omni  by vllm-project

Omni-modality model inference and serving framework

Created 3 months ago
873 stars

Top 41.2% on SourcePulse

GitHubView on GitHub
Project Summary

vLLM-Omni is an open-source framework designed for efficient, cost-effective serving of omni-modality models, extending vLLM's capabilities beyond text to include image, video, and audio data. It targets researchers and engineers requiring high-throughput inference for diverse model architectures, including non-autoregressive types, and offers a flexible, easy-to-use solution for complex multimodal AI workloads.

How It Works

The framework extends vLLM's efficient KV cache management to support omni-modality inference. It employs a fully disaggregated architecture using OmniConnector for dynamic resource allocation across pipelined stages, enabling high throughput via execution overlapping. vLLM-Omni specifically adds support for non-autoregressive models like Diffusion Transformers (DiT) and handles heterogeneous outputs, moving beyond traditional text generation.

Quick Start & Requirements

  • Primary installation and usage details are available via the official documentation.
  • Key community resources include the User Forum (discuss.vllm.ai) and the #sig-omni Slack channel (slack.vllm.ai).
  • Specific prerequisites like CUDA versions or hardware requirements are not detailed in the provided README excerpt.

Highlighted Details

  • Supports omni-modality (text, image, video, audio) and heterogeneous outputs.
  • Extends vLLM to non-autoregressive architectures (e.g., Diffusion Transformers).
  • Achieves high throughput via pipelined stage execution and efficient KV cache.
  • Offers tensor, pipeline, data, and expert parallelism for distributed inference.
  • Provides an OpenAI-compatible API server and supports streaming outputs.
  • Seamlessly integrates with popular Hugging Face models.

Maintenance & Community

The project welcomes contributions via Contributing to vLLM-Omni. Community discussions occur on the #sig-omni Slack channel (slack.vllm.ai) and the user forum (discuss.vllm.ai). The README notes "Latest News 🔥 [2025/11] vLLM community officially released vllm-project/vllm-omni", indicating recent activity.

Licensing & Compatibility

Licensed under the Apache License 2.0. This license is permissive and generally compatible with commercial use and linking in closed-source projects.

Limitations & Caveats

The provided README excerpt does not detail specific limitations, known bugs, or unsupported platforms. The project appears to be a recent extension of vLLM, with its maturity and stability for all supported modalities yet to be fully established.

Health Check
Last Commit

14 hours ago

Responsiveness

Inactive

Pull Requests (30d)
158
Issues (30d)
86
Star History
869 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), and
1 more.

FastVideo by hao-ai-lab

1.6%
3k
Framework for accelerated video generation
Created 1 year ago
Updated 14 hours ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
3 more.

LitServe by Lightning-AI

0.3%
4k
AI inference pipeline framework
Created 2 years ago
Updated 22 hours ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
2 more.

towhee by towhee-io

0.1%
3k
Framework for neural data processing pipelines
Created 4 years ago
Updated 1 year ago
Feedback? Help us improve.