DeepSeek-V3  by deepseek-ai

MoE language model research paper with 671B total parameters

created 7 months ago
98,433 stars

Top 0.1% on sourcepulse

GitHubView on GitHub
Project Summary

DeepSeek-V3 is a 671B parameter Mixture-of-Experts (MoE) language model designed for high performance across diverse tasks, including coding, math, and multilingual understanding. It targets researchers and developers seeking state-of-the-art open-source LLM capabilities, offering performance competitive with leading closed-source models.

How It Works

DeepSeek-V3 leverages a 671B total parameter architecture with 37B activated parameters per token, utilizing Multi-head Latent Attention (MLA) and DeepSeekMoE for efficiency. It pioneers an auxiliary-loss-free strategy for load balancing and a Multi-Token Prediction (MTP) objective for enhanced performance and speculative decoding. The model was trained on 14.8 trillion tokens using an FP8 mixed-precision framework, overcoming communication bottlenecks for efficient scaling. Post-training knowledge distillation from a Chain-of-Thought model (DeepSeek-R1) further refines its reasoning abilities.

Quick Start & Requirements

  • Installation: Local deployment is supported via multiple inference frameworks including DeepSeek-Infer Demo, SGLang, LMDeploy, TensorRT-LLM, vLLM, and LightLLM.
  • Prerequisites: Linux (Python 3.10+ for DeepSeek-Infer Demo), PyTorch 2.4.1, Triton 3.0.0, Transformers 4.46.3. FP8 inference is natively supported; BF16 weights can be generated via a provided script. AMD GPU support is available via SGLang. Huawei Ascend NPU support is available via MindIE.
  • Resources: Model weights are available on Hugging Face. Detailed inference instructions and framework-specific setup guides are provided.
  • Links: DeepSeek-V3 Hugging Face, How to Run Locally, SGLang, LMDeploy, TensorRT-LLM, vLLM, LightLLM

Highlighted Details

  • Achieves state-of-the-art performance on numerous benchmarks, outperforming other open-source models and matching closed-source competitors.
  • Demonstrates strong capabilities in coding (HumanEval: 65.2 Pass@1) and mathematics (GSM8K: 89.3 EM).
  • Supports a 128K context window with strong performance on Needle In A Haystack tests.
  • Offers FP8 inference support, enhancing efficiency.

Maintenance & Community

Licensing & Compatibility

  • Code repository is licensed under MIT. Model weights (Base/Chat) are subject to a separate Model License.
  • Commercial use is permitted for DeepSeek-V3 series models.

Limitations & Caveats

  • Mac and Windows operating systems are not supported for the DeepSeek-Infer Demo.
  • Multi-Token Prediction (MTP) support is under active development within the community.
  • TensorRT-LLM FP8 support is coming soon.
Health Check
Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
4
Issues (30d)
30
Star History
3,069 stars in the last 90 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-V2 by deepseek-ai

0.1%
5k
MoE language model for research/API use
created 1 year ago
updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of Build a Large Language Model From Scratch), and
6 more.

DeepSeek-R1 by deepseek-ai

0.1%
91k
Reasoning models research paper
created 6 months ago
updated 1 month ago
Feedback? Help us improve.