Transformer toolkit for GenAI/LLM acceleration on Intel platforms
Top 21.2% on sourcepulse
This toolkit accelerates Transformer-based models, particularly Large Language Models (LLMs), across Intel hardware (Gaudi2, CPUs, GPUs). It targets developers and researchers seeking to optimize LLM performance through advanced compression techniques and provides a customizable chatbot framework, NeuralChat.
How It Works
The extension integrates with Hugging Face Transformers, leveraging Intel® Neural Compressor for model compression. It employs advanced software optimizations and custom runtimes, including techniques from published research like "Fast Distilbert on CPUs" and "QuaLA-MiniLM," to achieve efficient inference and fine-tuning. It also offers a C/C++ inference engine with weight-only quantization kernels for Intel CPUs and GPUs.
Quick Start & Requirements
pip install intel-extension-for-transformers
requirements_cpu.txt
, requirements_hpu.txt
, and requirements_xpu.txt
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
9 months ago
1 day