DeepSeek-V2  by deepseek-ai

MoE language model for research/API use

Created 1 year ago
4,938 stars

Top 10.1% on SourcePulse

GitHubView on GitHub
Project Summary

DeepSeek-V2 is a powerful Mixture-of-Experts (MoE) language model designed for strong performance, economical training, and efficient inference. It targets researchers and developers needing high-quality language generation and coding capabilities, offering significant improvements in speed and cost reduction over previous models.

How It Works

DeepSeek-V2 employs two key architectural innovations: Multi-head Latent Attention (MLA) for efficient inference by compressing key-value pairs, and the DeepSeekMoE architecture for FFNs, enabling cost-effective training of larger models. This combination allows for a 236B total parameter model with only 21B activated per token, drastically reducing KV cache requirements and boosting throughput.

Quick Start & Requirements

  • Hugging Face Transformers: Requires 8x75GB GPUs for BF16 inference.
    from transformers import AutoTokenizer, AutoModelForCausalLM
    model_name = "deepseek-ai/DeepSeek-V2"
    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
    max_memory = {i: "75GB" for i in range(8)}
    model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="sequential", torch_dtype=torch.bfloat16, max_memory=max_memory, attn_implementation="eager")
    
  • SGLang (Recommended): Supports MLA, FP8, and Torch Compile for optimal performance.
    python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V2-Chat --tp 8 --trust-remote-code
    
  • vLLM (Recommended): Requires merging a specific PR.
    from vllm import LLM, SamplingParams
    llm = LLM(model="deepseek-ai/DeepSeek-V2-Chat", tensor_parallel_size=8, max_model_len=8192, trust_remote_code=True)
    
  • Dependencies: Python, PyTorch, Transformers, vLLM, or SGLang.

Highlighted Details

  • 236B total parameters, 21B activated per token.
  • Supports up to 128k context length.
  • Achieves state-of-the-art performance on Chinese benchmarks (C-Eval, CMMLU) and competitive results on English and coding tasks.
  • Offers a 16B parameter "Lite" version for reduced resource requirements.

Maintenance & Community

  • Developed by DeepSeek-AI.
  • Official chat available at chat.deepseek.com.
  • OpenAI-Compatible API available at platform.deepseek.com.

Licensing & Compatibility

  • Code licensed under MIT.
  • Model usage subject to Model License.
  • Supports commercial use for DeepSeek-V2 Base/Chat models.

Limitations & Caveats

The Hugging Face implementation may exhibit slower performance compared to the internal codebase. For optimal inference, using SGLang or vLLM with specific configurations is recommended.

Health Check
Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

dots.llm1 by rednote-hilab

0.2%
462
MoE model for research
Created 4 months ago
Updated 4 weeks ago
Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

yarn by jquesnelle

0.6%
2k
Context window extension method for LLMs (research paper, models)
Created 2 years ago
Updated 1 year ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Hanlin Tang Hanlin Tang(CTO Neural Networks at Databricks; Cofounder of MosaicML), and
5 more.

dbrx by databricks

0%
3k
Large language model for research/commercial use
Created 1 year ago
Updated 1 year ago
Starred by Phil Wang Phil Wang(Prolific Research Paper Implementer), Lianmin Zheng Lianmin Zheng(Coauthor of SGLang, vLLM), and
6 more.

Kimi-K2 by MoonshotAI

1.7%
8k
State-of-the-art MoE language model
Created 2 months ago
Updated 1 week ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Yiran Wu Yiran Wu(Coauthor of AutoGen), and
25 more.

grok-1 by xai-org

0.1%
51k
JAX example code for loading and running Grok-1 open-weights model
Created 1 year ago
Updated 1 year ago
Feedback? Help us improve.