Hardware plugin for vLLM on Ascend NPU
Top 37.1% on SourcePulse
This project provides a community-maintained hardware plugin for vLLM, enabling seamless execution of large language models on Ascend NPUs. It targets developers and researchers working with Ascend hardware who want to leverage vLLM's efficient inference capabilities. The plugin facilitates running various LLM architectures, including Transformer-like, MoE, Embedding, and multimodal models, on Ascend NPU platforms.
How It Works
vLLM-Ascend adheres to the vLLM RFC for hardware pluggability, creating a decoupled interface for Ascend NPU integration. This approach separates Ascend-specific optimizations and logic from the core vLLM framework, promoting maintainability and easier updates. It allows popular open-source LLMs to run efficiently on Ascend hardware by adapting vLLM's PagedAttention and other inference optimizations to the Ascend NPU architecture.
Quick Start & Requirements
Highlighted Details
main
(for vLLM main and 0.8.x) and specific version development branches (e.g., v0.7.3-dev
).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Older development branches, such as v0.7.1-dev
, are unmaintained, with only documentation fixes permitted. Users should ensure they are using a currently maintained branch that aligns with their vLLM version.
1 day ago
1 day