intel-extension-for-pytorch  by intel

PyTorch extension for performance boost on Intel platforms

created 5 years ago
1,917 stars

Top 23.2% on sourcepulse

GitHubView on GitHub
Project Summary

This package extends PyTorch to optimize performance on Intel hardware, targeting developers and researchers working with AI models, particularly Large Language Models (LLMs). It leverages Intel's specialized hardware instructions like AVX-512 VNNI and AMX on CPUs, and XMX on discrete GPUs, to accelerate computations and offers a xpu device for Intel discrete GPU acceleration.

How It Works

The extension integrates with PyTorch to automatically apply optimizations for Intel architectures. It specifically targets LLMs by implementing techniques such as indirect access KV cache, fused ROPE, and customized linear kernels. This approach aims to provide significant performance gains over standard PyTorch implementations on compatible Intel hardware, enabling faster training and inference for demanding AI workloads.

Quick Start & Requirements

  • Installation: Typically installed via pip.
  • Prerequisites: Requires PyTorch. Optimized performance relies on Intel hardware with AVX-512 VNNI, AMX, or XMX capabilities. For GPU acceleration, an Intel discrete GPU is needed.
  • Resources: Performance benefits are hardware-dependent.
  • Links: CPU Quick Start, GPU Quick Start, Documentations, LLM Example

Highlighted Details

  • Extensive LLM support, including Llama, Qwen, Phi, Mistral, and others, with optimizations for FP32, BF16, INT8, and INT4 quantization.
  • Provides module-level optimization APIs (prototype) for custom LLM acceleration.
  • Supports PyTorch xpu device for Intel discrete GPU acceleration.
  • Optimizations include indirect access KV cache, fused ROPE, and customized linear kernels.

Maintenance & Community

  • Managed via GitHub issues for bug tracking and feature requests.
  • GitHub Issues

Licensing & Compatibility

  • License: Apache License, Version 2.0.
  • Compatibility: Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

Performance gains are exclusively tied to Intel hardware. The module-level optimization APIs are marked as a prototype feature, suggesting potential for changes or instability.

Health Check
Last commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
8
Star History
92 stars in the last 90 days

Explore Similar Projects

Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Anton Bukov Anton Bukov(Cofounder of 1inch Network), and
16 more.

tinygrad by tinygrad

0.1%
30k
Minimalist deep learning framework for education and exploration
created 4 years ago
updated 20 hours ago
Feedback? Help us improve.