LLM acceleration library for Intel XPU (GPU, NPU, CPU)
Top 6.4% on sourcepulse
This library accelerates local LLM inference and fine-tuning on Intel hardware, targeting developers and researchers seeking to leverage Intel GPUs (iGPU, Arc, Flex, Max), NPUs, and CPUs. It offers seamless integration with popular LLM frameworks and tools, enabling efficient deployment of a wide range of LLMs with advanced optimizations and low-bit quantization.
How It Works
IPEX-LLM leverages Intel's Extension for PyTorch (IPEX) to optimize LLM operations for Intel's XPU architecture. It implements state-of-the-art LLM optimizations, including low-bit quantization (INT4, FP8, FP6) and techniques like Self-Speculative Decoding, to significantly boost inference speed and reduce memory footprint. The library also supports distributed inference strategies like pipeline parallelism for running larger models across multiple Intel GPUs.
Quick Start & Requirements
pip install intel_extension_for_pytorch
. Specific guides for Windows GPU, Linux GPU, and NPU are available.Highlighted Details
Maintenance & Community
bigdl-llm
noted.Licensing & Compatibility
Limitations & Caveats
1 day ago
1 day