Discover and explore top open-source AI tools and projects—updated daily.
deepseek-aiMoE language model research paper with 671B total parameters
Top 0.1% on SourcePulse
DeepSeek-V3 is a 671B parameter Mixture-of-Experts (MoE) language model designed for high performance across diverse tasks, including coding, math, and multilingual understanding. It targets researchers and developers seeking state-of-the-art open-source LLM capabilities, offering performance competitive with leading closed-source models.
How It Works
DeepSeek-V3 leverages a 671B total parameter architecture with 37B activated parameters per token, utilizing Multi-head Latent Attention (MLA) and DeepSeekMoE for efficiency. It pioneers an auxiliary-loss-free strategy for load balancing and a Multi-Token Prediction (MTP) objective for enhanced performance and speculative decoding. The model was trained on 14.8 trillion tokens using an FP8 mixed-precision framework, overcoming communication bottlenecks for efficient scaling. Post-training knowledge distillation from a Chain-of-Thought model (DeepSeek-R1) further refines its reasoning abilities.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
2 months ago
1 week
jzhang38
facebookresearch
google-research
dbiir
deepseek-ai