vllm-ascend by vllm-project

Hardware plugin for vLLM on Ascend NPU

Created 10 months ago

1,397 stars

Top 28.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

This project provides a community-maintained hardware plugin for vLLM, enabling seamless execution of large language models on Ascend NPUs. It targets developers and researchers working with Ascend hardware who want to leverage vLLM's efficient inference capabilities. The plugin facilitates running various LLM architectures, including Transformer-like, MoE, Embedding, and multimodal models, on Ascend NPU platforms.

How It Works

vLLM-Ascend adheres to the vLLM RFC for hardware pluggability, creating a decoupled interface for Ascend NPU integration. This approach separates Ascend-specific optimizations and logic from the core vLLM framework, promoting maintainability and easier updates. It allows popular open-source LLMs to run efficiently on Ascend hardware by adapting vLLM's PagedAttention and other inference optimizations to the Ascend NPU architecture.

Quick Start & Requirements

Installation: Refer to the official QuickStart and Installation guides.
Prerequisites:
- Hardware: Atlas 800I A2 Inference, Atlas A2 Training series.
- OS: Linux.
- Software: Python >= 3.9, < 3.12; CANN >= 8.0.0; PyTorch >= 2.5.1, torch-npu >= 2.5.1; vLLM (matching version).

Highlighted Details

Officially recognized by the vLLM community for Ascend backend support.
Supports a wide range of LLM architectures including Transformer-like, MoE, Embedding, and multimodal models.
Adheres to the vLLM hardware pluggable RFC for modular integration.
Maintained branches include main (for vLLM main and 0.8.x) and specific version development branches (e.g., v0.7.3-dev).

Maintenance & Community

Active community maintenance with weekly meetings (Wednesdays, 15:00-16:00 UTC+8).
Community channels include #sig-ascend, Users Forum, and links to documentation and meetup slides.
Contributions are welcomed via bug reports (issues) and usage questions (forum).

Licensing & Compatibility

Licensed under Apache License 2.0.
Compatible with commercial use and closed-source linking due to permissive licensing.

Limitations & Caveats

Older development branches, such as v0.7.1-dev, are unmaintained, with only documentation fixes permitted. Users should ensure they are using a currently maintained branch that aligns with their vLLM version.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

511

Issues (30d)

247

Star History

120 stars in the last 30 days