intel-extension-for-pytorch  by intel

PyTorch extension for performance boost on Intel platforms

Created 5 years ago
1,986 stars

Top 22.2% on SourcePulse

GitHubView on GitHub
Project Summary

This package extends PyTorch to optimize performance on Intel hardware, targeting developers and researchers working with AI models, particularly Large Language Models (LLMs). It leverages Intel's specialized hardware instructions like AVX-512 VNNI and AMX on CPUs, and XMX on discrete GPUs, to accelerate computations and offers a xpu device for Intel discrete GPU acceleration.

How It Works

The extension integrates with PyTorch to automatically apply optimizations for Intel architectures. It specifically targets LLMs by implementing techniques such as indirect access KV cache, fused ROPE, and customized linear kernels. This approach aims to provide significant performance gains over standard PyTorch implementations on compatible Intel hardware, enabling faster training and inference for demanding AI workloads.

Quick Start & Requirements

  • Installation: Typically installed via pip.
  • Prerequisites: Requires PyTorch. Optimized performance relies on Intel hardware with AVX-512 VNNI, AMX, or XMX capabilities. For GPU acceleration, an Intel discrete GPU is needed.
  • Resources: Performance benefits are hardware-dependent.
  • Links: CPU Quick Start, GPU Quick Start, Documentations, LLM Example

Highlighted Details

  • Extensive LLM support, including Llama, Qwen, Phi, Mistral, and others, with optimizations for FP32, BF16, INT8, and INT4 quantization.
  • Provides module-level optimization APIs (prototype) for custom LLM acceleration.
  • Supports PyTorch xpu device for Intel discrete GPU acceleration.
  • Optimizations include indirect access KV cache, fused ROPE, and customized linear kernels.

Maintenance & Community

  • Managed via GitHub issues for bug tracking and feature requests.
  • GitHub Issues

Licensing & Compatibility

  • License: Apache License, Version 2.0.
  • Compatibility: Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

Performance gains are exclusively tied to Intel hardware. The module-level optimization APIs are marked as a prototype feature, suggesting potential for changes or instability.

Health Check
Last Commit

10 hours ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
8
Star History
16 stars in the last 30 days

Explore Similar Projects

Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

airllm by lyogavin

0.7%
6k
Inference optimization for LLMs on low-resource hardware
Created 2 years ago
Updated 2 months ago
Starred by Lianmin Zheng Lianmin Zheng(Coauthor of SGLang, vLLM), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

MiniCPM by OpenBMB

0.1%
8k
Ultra-efficient LLMs for end devices, achieving 5x+ speedup
Created 1 year ago
Updated 3 weeks ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
40 more.

unsloth by unslothai

0.6%
48k
Finetuning tool for LLMs, targeting speed and memory efficiency
Created 1 year ago
Updated 5 hours ago
Feedback? Help us improve.