Discover and explore top open-source AI tools and projects—updated daily.
SharpAIBlazingly fast LLM inference for Apple Silicon
New!
Top 95.4% on SourcePulse
Summary
SwiftLM is a high-performance, native Swift inference server for Apple Silicon, designed to run MLX models directly without a Python runtime. It offers an OpenAI-compatible API, targeting developers and power users seeking maximum efficiency and bare-metal speed on macOS and iOS. The project eliminates Python overhead and GIL contention, providing a streamlined path to LLM deployment on Apple hardware.
How It Works
This project leverages Swift, Apple's MLX framework, and Metal for native GPU acceleration, compiling into a single, efficient binary. It bypasses Python and its associated performance bottlenecks. Key architectural choices include TurboQuantization for aggressive KV cache compression (achieving ~3.6 bits per coordinate) and experimental SSD Expert Streaming, which offloads Mixture of Experts (MoE) layers directly from NVMe SSDs to the GPU command buffer. This approach mitigates macOS unified memory limitations, enabling larger models on constrained hardware.
Quick Start & Requirements
./build.sh script.--model <model_id> and --port <port>, e.g., ./SwiftLM --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413. Models download automatically if not cached.xcodebuild -downloadComponent MetalToolchain).Highlighted Details
/v1/chat/completions for seamless integration with existing OpenAI SDKs.Maintenance & Community
The project relies heavily on the MLX community and various open-source projects for its foundation. Specific community channels or direct contributor information are not detailed in the README.
Licensing & Compatibility
Licensed under the MIT License, permitting broad use, modification, and distribution, including for commercial purposes.
Limitations & Caveats
The SSD Expert Streaming feature is experimental. Aggressive quantization (e.g., 2-bit) can lead to model instability and break features like OpenAI-compatible tool calling due to JSON grammar corruption.
22 hours ago
Inactive
t8
exo-explore
trymirai