Discover and explore top open-source AI tools and projects—updated daily.
vllm-projectLLM inference acceleration for Apple Silicon
Top 84.7% on SourcePulse
Summary
vLLM Metal is a community-maintained hardware plugin designed to enable high-performance Large Language Model (LLM) inference on Apple Silicon Macs. It integrates the popular vLLM inference engine with Apple's Metal GPU via the MLX compute backend, offering developers and researchers a way to achieve faster LLM inference and more efficient memory utilization on macOS.
How It Works
This plugin establishes a unified compute backend for vLLM, primarily leveraging MLX for accelerated computations such as attention mechanisms, normalization, and positional encodings. It complements MLX with PyTorch for model loading and interoperability, creating a cohesive system. A core design principle is the exploitation of Apple Silicon's unified memory architecture, facilitating true zero-copy operations to minimize data transfer overhead and boost performance. The plugin layer abstracts the underlying MetalPlatform, MetalWorker, and MetalModelRunner components, ensuring seamless integration with vLLM's engine, scheduler, and API server.
Quick Start & Requirements
curl -fsSL https://raw.githubusercontent.com/vllm-project/vllm-metal/main/install.sh | bash.VLLM_METAL_MEMORY_FRACTION for memory allocation, VLLM_METAL_USE_MLX to enable/disable MLX compute, and VLLM_MLX_DEVICE to select the MLX device (GPU or CPU).Highlighted Details
Maintenance & Community
No specific details regarding maintainers, community channels (like Discord/Slack), or roadmap were provided in the README excerpt.
Licensing & Compatibility
No licensing information or compatibility notes for commercial use were present in the provided README excerpt.
Limitations & Caveats
The provided README excerpt does not detail specific limitations, known bugs, or alpha/beta status, focusing primarily on features and setup instructions.
2 days ago
Inactive
Mega4alik
b4rtaz
vllm-project