DeepSeek-V3 by deepseek-ai

MoE language model research paper with 671B total parameters

Created 1 year ago

101,022 stars

Top 0.1% on SourcePulse

View on GitHub

18 Experts Love This Project

Chief Scientist at Luma AI

Syrus Akbary

Founder of Wasmer

and 14 more!

Project Summary

DeepSeek-V3 is a 671B parameter Mixture-of-Experts (MoE) language model designed for high performance across diverse tasks, including coding, math, and multilingual understanding. It targets researchers and developers seeking state-of-the-art open-source LLM capabilities, offering performance competitive with leading closed-source models.

How It Works

DeepSeek-V3 leverages a 671B total parameter architecture with 37B activated parameters per token, utilizing Multi-head Latent Attention (MLA) and DeepSeekMoE for efficiency. It pioneers an auxiliary-loss-free strategy for load balancing and a Multi-Token Prediction (MTP) objective for enhanced performance and speculative decoding. The model was trained on 14.8 trillion tokens using an FP8 mixed-precision framework, overcoming communication bottlenecks for efficient scaling. Post-training knowledge distillation from a Chain-of-Thought model (DeepSeek-R1) further refines its reasoning abilities.

Quick Start & Requirements

Installation: Local deployment is supported via multiple inference frameworks including DeepSeek-Infer Demo, SGLang, LMDeploy, TensorRT-LLM, vLLM, and LightLLM.
Prerequisites: Linux (Python 3.10+ for DeepSeek-Infer Demo), PyTorch 2.4.1, Triton 3.0.0, Transformers 4.46.3. FP8 inference is natively supported; BF16 weights can be generated via a provided script. AMD GPU support is available via SGLang. Huawei Ascend NPU support is available via MindIE.
Resources: Model weights are available on Hugging Face. Detailed inference instructions and framework-specific setup guides are provided.
Links: DeepSeek-V3 Hugging Face, How to Run Locally, SGLang, LMDeploy, TensorRT-LLM, vLLM, LightLLM

Highlighted Details

Achieves state-of-the-art performance on numerous benchmarks, outperforming other open-source models and matching closed-source competitors.
Demonstrates strong capabilities in coding (HumanEval: 65.2 Pass@1) and mathematics (GSM8K: 89.3 EM).
Supports a 128K context window with strong performance on Needle In A Haystack tests.
Offers FP8 inference support, enhancing efficiency.

Maintenance & Community

Developed by DeepSeek AI. Contact: service@deepseek.com.

Licensing & Compatibility

Code repository is licensed under MIT. Model weights (Base/Chat) are subject to a separate Model License.
Commercial use is permitted for DeepSeek-V3 series models.

Limitations & Caveats

Mac and Windows operating systems are not supported for the DeepSeek-Infer Demo.
Multi-Token Prediction (MTP) support is under active development within the community.
TensorRT-LLM FP8 support is coming soon.

Health Check

Last Commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

591 stars in the last 30 days