intel-npu-acceleration-library  by intel

Python library for Intel NPU acceleration (now end-of-life)

created 1 year ago
680 stars

Top 50.8% on sourcepulse

GitHubView on GitHub
Project Summary

This Python library aimed to accelerate AI computations on Intel Neural Processing Units (NPUs), targeting developers working with Intel Core Ultra processors. It provided low-level access to NPU hardware for high-speed matrix operations and model inference, with the goal of boosting application efficiency.

How It Works

The library leverages Intel's NPU architecture, which includes dedicated Neural Compute Engines for AI operations like matrix multiplication and convolution, and Streaming Hybrid Architecture Vector Engines for general computing. It utilizes compiler technology to optimize AI workloads by tiling compute and data flow, maximizing utilization of on-chip SRAM and minimizing DRAM transfers for performance and power efficiency.

Quick Start & Requirements

  • Install via pip: pip install intel-npu-acceleration-library
  • Requires an available NPU (check system compatibility).
  • Supported OS: Ubuntu (Linux), Windows. macOS is not supported.
  • Recommended: Latest NPU drivers.
  • Documentation: Intel® NPU Acceleration Library Documentation

Highlighted Details

  • Supports 8-bit quantization, 4-bit Quantization and GPTQ, NPU-native mixed precision, and Float16.
  • Integrates with torch.compile for NPU optimization (Windows torch.compile not supported; use explicit intel_npu_acceleration_library.compile).
  • Includes examples for running MatMul operations and Hugging Face models (e.g., TinyLlama) on the NPU.
  • Feature roadmap included key enhancements like BFloat16 support and NPU/GPU hetero compute (some features marked as not implemented).

Maintenance & Community

This project is no longer under active management by Intel and has been archived. Intel has ceased development, maintenance, bug fixes, and contributions. The project is available for reference, and users are encouraged to fork it for independent development. Intel recommends adopting OpenVINO™ and OpenVINO™ GenAI for NPU acceleration.

Licensing & Compatibility

The license is not explicitly stated in the provided README text. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The project is officially announced as End-of-Life and will not receive further updates or maintenance from Intel. macOS is not supported. Windows torch.compile is not supported. Users are directed to OpenVINO™ and OpenVINO™ GenAI for current NPU acceleration solutions.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
21 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jaret Burkett Jaret Burkett(Founder of Ostris), and
1 more.

nunchaku by nunchaku-tech

2.3%
3k
High-performance 4-bit diffusion model inference engine
created 9 months ago
updated 1 day ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
16 more.

flash-attention by Dao-AILab

0.6%
19k
Fast, memory-efficient attention implementation
created 3 years ago
updated 2 days ago
Feedback? Help us improve.