vllm-ascend  by vllm-project

Hardware plugin for vLLM on Ascend NPU

created 6 months ago
1,004 stars

Top 37.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a community-maintained hardware plugin for vLLM, enabling seamless execution of large language models on Ascend NPUs. It targets developers and researchers working with Ascend hardware who want to leverage vLLM's efficient inference capabilities. The plugin facilitates running various LLM architectures, including Transformer-like, MoE, Embedding, and multimodal models, on Ascend NPU platforms.

How It Works

vLLM-Ascend adheres to the vLLM RFC for hardware pluggability, creating a decoupled interface for Ascend NPU integration. This approach separates Ascend-specific optimizations and logic from the core vLLM framework, promoting maintainability and easier updates. It allows popular open-source LLMs to run efficiently on Ascend hardware by adapting vLLM's PagedAttention and other inference optimizations to the Ascend NPU architecture.

Quick Start & Requirements

  • Installation: Refer to the official QuickStart and Installation guides.
  • Prerequisites:
    • Hardware: Atlas 800I A2 Inference, Atlas A2 Training series.
    • OS: Linux.
    • Software: Python >= 3.9, < 3.12; CANN >= 8.0.0; PyTorch >= 2.5.1, torch-npu >= 2.5.1; vLLM (matching version).

Highlighted Details

  • Officially recognized by the vLLM community for Ascend backend support.
  • Supports a wide range of LLM architectures including Transformer-like, MoE, Embedding, and multimodal models.
  • Adheres to the vLLM hardware pluggable RFC for modular integration.
  • Maintained branches include main (for vLLM main and 0.8.x) and specific version development branches (e.g., v0.7.3-dev).

Maintenance & Community

  • Active community maintenance with weekly meetings (Wednesdays, 15:00-16:00 UTC+8).
  • Community channels include #sig-ascend, Users Forum, and links to documentation and meetup slides.
  • Contributions are welcomed via bug reports (issues) and usage questions (forum).

Licensing & Compatibility

  • Licensed under Apache License 2.0.
  • Compatible with commercial use and closed-source linking due to permissive licensing.

Limitations & Caveats

Older development branches, such as v0.7.1-dev, are unmaintained, with only documentation fixes permitted. Users should ensure they are using a currently maintained branch that aligns with their vLLM version.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
497
Issues (30d)
168
Star History
133 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Junyang Lin Junyang Lin(Core Maintainer of Alibaba Qwen), and
5 more.

LightLLM by ModelTC

0.9%
4k
Python framework for LLM inference and serving
created 2 years ago
updated 1 day ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elvis Saravia Elvis Saravia(Founder of DAIR.AI), and
1 more.

LMCache by LMCache

2.8%
4k
LLM serving engine extension for reduced TTFT and increased throughput
created 1 year ago
updated 1 day ago
Starred by Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
9 more.

verl by volcengine

2.2%
12k
RL training library for LLMs
created 9 months ago
updated 1 day ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Clement Delangue Clement Delangue(Cofounder of Hugging Face), and
42 more.

vllm by vllm-project

1.4%
55k
LLM serving engine for high-throughput, memory-efficient inference
created 2 years ago
updated 1 day ago
Feedback? Help us improve.