Discover and explore top open-source AI tools and projects—updated daily.
waybarriosRun LLMs and multimodal models on Apple Silicon via OpenAI API
Top 70.2% on SourcePulse
This project provides a high-throughput, OpenAI-compatible inference server for Large Language Models (LLMs) and Vision-Language Models (VLMs) specifically optimized for Apple Silicon. It targets developers and researchers seeking to leverage their Mac hardware for accelerated AI tasks, offering multimodal capabilities (text, image, video, audio) and efficient memory management through native MLX integration.
How It Works
vLLM-MLX integrates Apple's MLX framework, including mlx-lm for LLM inference, mlx-vlm for multimodal processing, and mlx-audio for speech tasks, into a vLLM-like serving architecture. This approach utilizes MLX's unified memory and Metal kernels for native GPU acceleration on Apple Silicon. Key optimizations include paged KV cache for memory efficiency and continuous batching to maximize throughput for concurrent user requests.
Quick Start & Requirements
git clone https://github.com/waybarrios/vllm-mlx.git), navigate into the directory (cd vllm-mlx), and install using pip (pip install -e .).pip install vllm-mlx[audio], python -m spacy download en_core_web_sm, and brew install espeak-ng on macOS.docs directory.Highlighted Details
GEMMA3_SLIDING_WINDOW, requiring manual code modification in mlx-vlm.Maintenance & Community
The README does not detail specific community channels (e.g., Discord, Slack), notable contributors beyond the primary author (Wayner Barrios), sponsorships, or a public roadmap. Contributions are welcomed via pull requests.
Licensing & Compatibility
Limitations & Caveats
espeak-ng) and specific Python package extras.1 day ago
Inactive
zhihu
vllm-project
kvcache-ai