DeepSeek-V3 is a 671B parameter Mixture-of-Experts (MoE) language model designed for high performance across diverse tasks, including coding, math, and multilingual understanding. It targets researchers and developers seeking state-of-the-art open-source LLM capabilities, offering performance competitive with leading closed-source models.
How It Works
DeepSeek-V3 leverages a 671B total parameter architecture with 37B activated parameters per token, utilizing Multi-head Latent Attention (MLA) and DeepSeekMoE for efficiency. It pioneers an auxiliary-loss-free strategy for load balancing and a Multi-Token Prediction (MTP) objective for enhanced performance and speculative decoding. The model was trained on 14.8 trillion tokens using an FP8 mixed-precision framework, overcoming communication bottlenecks for efficient scaling. Post-training knowledge distillation from a Chain-of-Thought model (DeepSeek-R1) further refines its reasoning abilities.
Quick Start & Requirements
- Installation: Local deployment is supported via multiple inference frameworks including DeepSeek-Infer Demo, SGLang, LMDeploy, TensorRT-LLM, vLLM, and LightLLM.
- Prerequisites: Linux (Python 3.10+ for DeepSeek-Infer Demo), PyTorch 2.4.1, Triton 3.0.0, Transformers 4.46.3. FP8 inference is natively supported; BF16 weights can be generated via a provided script. AMD GPU support is available via SGLang. Huawei Ascend NPU support is available via MindIE.
- Resources: Model weights are available on Hugging Face. Detailed inference instructions and framework-specific setup guides are provided.
- Links: DeepSeek-V3 Hugging Face, How to Run Locally, SGLang, LMDeploy, TensorRT-LLM, vLLM, LightLLM
Highlighted Details
- Achieves state-of-the-art performance on numerous benchmarks, outperforming other open-source models and matching closed-source competitors.
- Demonstrates strong capabilities in coding (HumanEval: 65.2 Pass@1) and mathematics (GSM8K: 89.3 EM).
- Supports a 128K context window with strong performance on Needle In A Haystack tests.
- Offers FP8 inference support, enhancing efficiency.
Maintenance & Community
Licensing & Compatibility
- Code repository is licensed under MIT. Model weights (Base/Chat) are subject to a separate Model License.
- Commercial use is permitted for DeepSeek-V3 series models.
Limitations & Caveats
- Mac and Windows operating systems are not supported for the DeepSeek-Infer Demo.
- Multi-Token Prediction (MTP) support is under active development within the community.
- TensorRT-LLM FP8 support is coming soon.