Discover and explore top open-source AI tools and projects—updated daily.
LLM inference engine optimized for diverse AI accelerators
Top 58.6% on SourcePulse
xllm: High-Performance LLM Inference Engine for Diverse AI Accelerators
xLLM is an efficient inference framework designed for Large Language Models (LLMs), specifically optimized for Chinese AI accelerators. It targets enterprises seeking to deploy LLMs with enhanced efficiency and reduced costs, offering a service-engine decoupled architecture that achieves breakthrough performance through advanced optimization techniques.
How It Works
The framework employs a service-engine decoupled architecture. At the service layer, it utilizes elastic scheduling and dynamic request handling. The engine layer incorporates multi-stream parallel computing, graph fusion optimization, speculative inference, dynamic load balancing, and global KV cache management. This combination accelerates inference by overlapping computation and communication, optimizing memory usage, and adapting dynamically to model shapes and workloads, particularly on supported hardware.
Quick Start & Requirements
Installation is primarily facilitated via Docker. Users can pull pre-built images (e.g., xllm/xllm-ai:xllm-0.6.0-dev-hb-rc2-py3.11-oe24.03-lts
) and run containers with necessary device passthrough (--device=/dev/davinci0
, etc.) and volume mounts. Alternatively, the project can be built from source by cloning the repository, initializing submodules, managing dependencies via pip
, and compiling using setup.py
. Key requirements include Ascend AI accelerators and vcpkg for building. Official documentation is available at https://xllm.readthedocs.io/zh-cn/latest/ and Docker images at https://hub.docker.com/r/xllm/xllm-ai.
Highlighted Details
Maintenance & Community
The project actively encourages contributions through issue reporting and pull requests. Community support is available via internal Slack channels and a WeChat user group. Several university research labs and numerous developers are acknowledged contributors.
Licensing & Compatibility
xLLM is licensed under the Apache License 2.0, which permits commercial use and modification.
Limitations & Caveats
The framework is heavily optimized for specific Chinese AI accelerators (e.g., Ascend), potentially limiting performance or compatibility on other hardware. The provided Docker image tags suggest the project may be in a development or release candidate stage. Setup requires specific hardware configurations and potentially complex Docker environment management.
8 hours ago
Inactive