deepseek.cpp  by andrewkchan

CPU inference for DeepSeek LLMs in C++

created 7 months ago
308 stars

Top 88.2% on sourcepulse

GitHubView on GitHub
Project Summary

This C++ project provides CPU-only inference for the DeepSeek family of large language models, targeting users who need efficient, hackable, and self-contained LLM execution without GPU dependencies. It offers a lean alternative to larger inference engines, enabling focused study of DeepSeek model performance on CPU.

How It Works

The implementation is based on Yet Another Language Model (YALM) and is specifically tailored for DeepSeek architectures. It utilizes custom quantization methods like f8e5m2 (128x128 blocks with full precision MoE gates and layer norms) and q2_k (llama.cpp's 2-bit K-quantization) to optimize CPU performance and memory usage. The project prioritizes simplicity and hackability, with a significantly smaller codebase compared to other inference engines.

Quick Start & Requirements

  • Install: pip install . (after cloning the repo and installing git-lfs and build tools).
  • Prerequisites: C++20-compatible compiler, Python 3.x, git-lfs, python3-dev, build-essential.
  • Model Conversion: Requires Hugging Face format safetensor weights, converted using python convert.py --quant <quant_type> <model_dir>.
  • Execution: ./build/main <model_weights_dir> -i "prompt"
  • Performance Tuning: OMP_NUM_THREADS environment variable is crucial for optimal throughput.
  • Resources: DeepSeek V3 (F8E5M2) requires ~650GB RAM; Q2_K requires ~206GB RAM.
  • Docs: CLI help available via ./build/main -h.

Highlighted Details

  • CPU-only inference for DeepSeek models.
  • Custom quantization methods (f8e5m2, q2_k) for accuracy and efficiency.
  • Small codebase (<2k LOC excluding dependencies), emphasizing hackability.
  • Supports various DeepSeek model versions and quantization types (FP16, FP32, Q2_K, F8E5M2).

Maintenance & Community

This is a personal side project for learning and experimentation. Contributions (PRs) are welcome.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Only decoding (incremental generation) is implemented; prefill operations and optimizations like speculative decoding are missing. Some DeepSeek V3 architectural features are not yet implemented, potentially impacting accuracy. Models may exhibit repetitive behavior at low temperatures.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
20 stars in the last 90 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), Eugene Yan Eugene Yan(AI Scientist at AWS), and
2 more.

starcoder.cpp by bigcode-project

0.2%
456
C++ example for StarCoder inference
created 2 years ago
updated 1 year ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-V2 by deepseek-ai

0.1%
5k
MoE language model for research/API use
created 1 year ago
updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of Build a Large Language Model From Scratch), and
6 more.

DeepSeek-R1 by deepseek-ai

0.1%
91k
Reasoning models research paper
created 6 months ago
updated 1 month ago
Feedback? Help us improve.