Discover and explore top open-source AI tools and projects—updated daily.
antirezFast local inference for DeepSeek V4 Flash models
New!
Top 15.6% on SourcePulse
Summary
ds4 is a specialized, native inference engine for DeepSeek V4 Flash, optimized for Apple's Metal GPU. It enables high-performance local LLM execution on high-end Macs, featuring long context windows and efficient on-disk KV cache persistence.
How It Works This project utilizes a DeepSeek V4 Flash-specific Metal graph executor with custom loading and KV state management. Its advantage lies in DeepSeek V4 Flash's speed, proportional thinking, and 1 million token context window. A key innovation is treating the KV cache as a "first-class disk citizen," leveraging fast SSDs for long-context persistence, with development significantly assisted by GPT 5.5.
Quick Start & Requirements
make). Download specific DeepSeek V4 Flash GGUF models via ./download_model.sh (e.g., q2 for 128GB RAM, q4 for >= 256GB RAM)../ds4 or start a server with ./ds4-server.Highlighted Details
Maintenance & Community
The project acknowledges contributions from the llama.cpp community and notes significant AI assistance from GPT 5.5. No specific community channels or roadmap links are provided.
Licensing & Compatibility While adapted pieces use the MIT license and GGML authors' copyright is noted, the overall project license is not explicitly stated. It is strictly compatible only with the project's specially crafted DeepSeek V4 Flash GGUF files.
Limitations & Caveats This is "alpha quality code" primarily for Metal GPUs; the CPU path is unstable and for debugging only. The engine is model-specific and developed with significant AI assistance.
14 hours ago
Inactive