llm-scaler  by intel

GenAI acceleration for Intel Arc Pro GPUs

Created 10 months ago
284 stars

Top 91.9% on SourcePulse

GitHubView on GitHub
Project Summary

Summary LLM Scaler is an Intel GenAI solution optimized for text, image, and video generation on Intel® Arc™ Pro B60 GPUs. It integrates popular frameworks like vLLM, ComfyUI, SGLang Diffusion, and Xinference to deliver high performance for state-of-the-art models, targeting users and developers seeking efficient generative AI capabilities on Intel hardware.

How It Works The project provides optimized builds for diverse GenAI models, leveraging vLLM for text generation with features like CCL support, INT4/FP8 quantization, and multi-modal capabilities. The "Omni" component extends capabilities to image, video, and audio generation via ComfyUI (Omni Studio) and SGLang Diffusion/Xinference (Omni Serving), offering OpenAI-API compatible endpoints. This approach maximizes throughput and minimizes latency on Arc Pro B60 hardware.

Quick Start & Requirements

  • Installation: Primarily distributed via Docker images; specific Python package installations may be needed (e.g., transformers==5.0.0rc0). Refer to "Getting Started" documentation.
  • Prerequisites: Requires Intel® Arc™ Pro B60 GPUs. Supports Python 3.12 and PyTorch versions (e.g., 2.9, 2.10).
  • Links: Docker image releases are available. "Getting Started" guides are referenced.

Highlighted Details

  • Extensive model support across language, multimodal, audio, embedding, and reranker categories, including recent models like Qwen3.5, Llama-3.1, and Gemma-3.
  • Advanced quantization techniques (FP8, INT4, MXFP4) for efficient model serving.
  • Optimized integration with Intel Arc Pro B60 GPUs for performance and scalability.
  • Active development with frequent releases adding new models and framework support.

Maintenance & Community The project demonstrates active maintenance with frequent updates. Support is primarily handled through GitHub Issues for bug reporting and feature requests. No specific community channels (Discord/Slack) were mentioned.

Licensing & Compatibility License information is not explicitly stated in the provided README content, requiring further investigation for commercial use.

Limitations & Caveats

  • Strict hardware dependency on Intel® Arc™ Pro B60 GPUs.
  • Some "Omni" components are noted as experimental.
  • Potential accuracy issues with specific quantization methods (e.g., FP8 for DeepSeek-OCR-2).
  • Certain models may have specific installation prerequisites or require particular framework versions.
Health Check
Last Commit

13 hours ago

Responsiveness

Inactive

Pull Requests (30d)
25
Issues (30d)
27
Star History
70 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.