vllm-ascend  by vllm-project

Hardware plugin for vLLM on Ascend NPU

Created 8 months ago
1,203 stars

Top 32.4% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a community-maintained hardware plugin for vLLM, enabling seamless execution of large language models on Ascend NPUs. It targets developers and researchers working with Ascend hardware who want to leverage vLLM's efficient inference capabilities. The plugin facilitates running various LLM architectures, including Transformer-like, MoE, Embedding, and multimodal models, on Ascend NPU platforms.

How It Works

vLLM-Ascend adheres to the vLLM RFC for hardware pluggability, creating a decoupled interface for Ascend NPU integration. This approach separates Ascend-specific optimizations and logic from the core vLLM framework, promoting maintainability and easier updates. It allows popular open-source LLMs to run efficiently on Ascend hardware by adapting vLLM's PagedAttention and other inference optimizations to the Ascend NPU architecture.

Quick Start & Requirements

  • Installation: Refer to the official QuickStart and Installation guides.
  • Prerequisites:
    • Hardware: Atlas 800I A2 Inference, Atlas A2 Training series.
    • OS: Linux.
    • Software: Python >= 3.9, < 3.12; CANN >= 8.0.0; PyTorch >= 2.5.1, torch-npu >= 2.5.1; vLLM (matching version).

Highlighted Details

  • Officially recognized by the vLLM community for Ascend backend support.
  • Supports a wide range of LLM architectures including Transformer-like, MoE, Embedding, and multimodal models.
  • Adheres to the vLLM hardware pluggable RFC for modular integration.
  • Maintained branches include main (for vLLM main and 0.8.x) and specific version development branches (e.g., v0.7.3-dev).

Maintenance & Community

  • Active community maintenance with weekly meetings (Wednesdays, 15:00-16:00 UTC+8).
  • Community channels include #sig-ascend, Users Forum, and links to documentation and meetup slides.
  • Contributions are welcomed via bug reports (issues) and usage questions (forum).

Licensing & Compatibility

  • Licensed under Apache License 2.0.
  • Compatible with commercial use and closed-source linking due to permissive licensing.

Limitations & Caveats

Older development branches, such as v0.7.1-dev, are unmaintained, with only documentation fixes permitted. Users should ensure they are using a currently maintained branch that aligns with their vLLM version.

Health Check
Last Commit

6 hours ago

Responsiveness

1 day

Pull Requests (30d)
427
Issues (30d)
199
Star History
85 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Ying Sheng Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

0.6%
4k
High-performance C++ LLM inference library
Created 2 years ago
Updated 2 weeks ago
Feedback? Help us improve.