vLLM-Kunlun by baidu

vLLM inference acceleration for Kunlun XPU

Created 7 months ago

446 stars

Top 66.5% on SourcePulse

Project Summary

vLLM Kunlun is a community-maintained hardware plugin designed to enable the vLLM inference engine to run on Kunlun XPU hardware. It targets users and researchers who need to leverage Kunlun's specialized compute capabilities for deploying and experimenting with large language models, offering a decoupled integration strategy for broad model compatibility.

How It Works

This project implements a hardware-pluggable interface, aligning with vLLM's architectural principles for hardware integration. This design choice decouples the Kunlun XPU backend from the core vLLM framework, facilitating easier integration and enabling efficient execution of various LLM architectures, including Transformer-based models, Mixture-of-Experts (MoE), embedding models, and multimodal LLMs, directly on Kunlun hardware.

Quick Start & Requirements

Prerequisites: Requires Kunlun3 P800 hardware, Ubuntu 22.04 operating system, Python version 3.10 or higher, PyTorch version 2.5.1 or higher, and a vLLM installation matching the vllm-kunlun version.
Installation: Detailed setup instructions are available in the official QuickStart and Installation documentation.
Recommended Versions: The latest stable release is v0.11.0.

Highlighted Details

Supports a comprehensive range of models, including Qwen (2, 2.5, 3, 3-Moe, 3-Next), Llama (2, 3, 3.1), DeepSeek (R1, V3, V3.2), Kimi-K2, MiMo-V2-Flash, and gpt-oss.
Incorporates support for advanced LLM techniques such as Quantization, LoRA fine-tuning, Piecewise operations, and Kunlun Graph optimizations.
Demonstrates high-performance computing capabilities on Kunlun3 P800, with reported performance metrics for 16-way concurrency and 2048 input/output sizes.

Maintenance & Community

The project was initiated in December 2025.
Significant support is provided by the KunLunXin team, who supply XPU resources for development and testing.
Community discussions and updates can be found via their Slack channel.

Licensing & Compatibility

The project is licensed under the Apache License 2.0.
This permissive license generally allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The plugin has specific hardware (Kunlun3 P800) and operating system (Ubuntu 22.04) requirements, which may limit its applicability on other platforms.
As a community-maintained project, its support structure and development velocity may differ from officially maintained vLLM components.

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

17 stars in the last 30 days