ipex-llm by intel

LLM acceleration library for Intel XPU (GPU, NPU, CPU)

Created 9 years ago

8,602 stars

Top 6.0% on SourcePulse

View on GitHub

8 Experts Love This Project

Cofounder of Lightning AI

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

and 4 more!

Project Summary

This library accelerates local LLM inference and fine-tuning on Intel hardware, targeting developers and researchers seeking to leverage Intel GPUs (iGPU, Arc, Flex, Max), NPUs, and CPUs. It offers seamless integration with popular LLM frameworks and tools, enabling efficient deployment of a wide range of LLMs with advanced optimizations and low-bit quantization.

How It Works

IPEX-LLM leverages Intel's Extension for PyTorch (IPEX) to optimize LLM operations for Intel's XPU architecture. It implements state-of-the-art LLM optimizations, including low-bit quantization (INT4, FP8, FP6) and techniques like Self-Speculative Decoding, to significantly boost inference speed and reduce memory footprint. The library also supports distributed inference strategies like pipeline parallelism for running larger models across multiple Intel GPUs.

Quick Start & Requirements

Installation: Typically via pip install intel_extension_for_pytorch. Specific guides for Windows GPU, Linux GPU, and NPU are available.
Prerequisites: Python, PyTorch. Specific hardware (Intel GPU/NPU) is required for hardware acceleration. CUDA is not a primary dependency.
Resources: Setup time varies; running LLMs requires significant VRAM/RAM depending on the model size and quantization.
Links: Quickstart Guides, Verified Models

Highlighted Details

Supports over 70 LLM models, including Llama, Mistral, Mixtral, Gemma, and Qwen.
Offers low-bit quantization (INT4, FP8, FP6, INT2) for reduced memory and faster inference.
Provides seamless integration with llama.cpp, Ollama, HuggingFace Transformers, LangChain, LlamaIndex, vLLM, and DeepSpeed.
Includes support for fine-tuning techniques like LoRA, QLoRA, DPO, QA-LoRA, and ReLoRA on Intel GPUs.

Maintenance & Community

Actively developed by Intel.
Migration from bigdl-llm noted.
Support channels via GitHub Issues.

Licensing & Compatibility

Apache 2.0 License.
Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

Performance optimizations are primarily targeted at Intel hardware; performance on non-Intel products may vary.
Experimental NPU support is available for Intel Core Ultra processors.
Some advanced features like INT2 quantization are based on specific llama.cpp mechanisms.

Health Check

Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

87 stars in the last 30 days