DeepSeek-V2  by deepseek-ai

MoE language model for research/API use

created 1 year ago
4,935 stars

Top 10.2% on sourcepulse

GitHubView on GitHub
Project Summary

DeepSeek-V2 is a powerful Mixture-of-Experts (MoE) language model designed for strong performance, economical training, and efficient inference. It targets researchers and developers needing high-quality language generation and coding capabilities, offering significant improvements in speed and cost reduction over previous models.

How It Works

DeepSeek-V2 employs two key architectural innovations: Multi-head Latent Attention (MLA) for efficient inference by compressing key-value pairs, and the DeepSeekMoE architecture for FFNs, enabling cost-effective training of larger models. This combination allows for a 236B total parameter model with only 21B activated per token, drastically reducing KV cache requirements and boosting throughput.

Quick Start & Requirements

  • Hugging Face Transformers: Requires 8x75GB GPUs for BF16 inference.
    from transformers import AutoTokenizer, AutoModelForCausalLM
    model_name = "deepseek-ai/DeepSeek-V2"
    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
    max_memory = {i: "75GB" for i in range(8)}
    model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="sequential", torch_dtype=torch.bfloat16, max_memory=max_memory, attn_implementation="eager")
    
  • SGLang (Recommended): Supports MLA, FP8, and Torch Compile for optimal performance.
    python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V2-Chat --tp 8 --trust-remote-code
    
  • vLLM (Recommended): Requires merging a specific PR.
    from vllm import LLM, SamplingParams
    llm = LLM(model="deepseek-ai/DeepSeek-V2-Chat", tensor_parallel_size=8, max_model_len=8192, trust_remote_code=True)
    
  • Dependencies: Python, PyTorch, Transformers, vLLM, or SGLang.

Highlighted Details

  • 236B total parameters, 21B activated per token.
  • Supports up to 128k context length.
  • Achieves state-of-the-art performance on Chinese benchmarks (C-Eval, CMMLU) and competitive results on English and coding tasks.
  • Offers a 16B parameter "Lite" version for reduced resource requirements.

Maintenance & Community

  • Developed by DeepSeek-AI.
  • Official chat available at chat.deepseek.com.
  • OpenAI-Compatible API available at platform.deepseek.com.

Licensing & Compatibility

  • Code licensed under MIT.
  • Model usage subject to Model License.
  • Supports commercial use for DeepSeek-V2 Base/Chat models.

Limitations & Caveats

The Hugging Face implementation may exhibit slower performance compared to the internal codebase. For optimal inference, using SGLang or vLLM with specific configurations is recommended.

Health Check
Last commit

10 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
78 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of Build a Large Language Model From Scratch), and
6 more.

DeepSeek-R1 by deepseek-ai

0.1%
91k
Reasoning models research paper
created 6 months ago
updated 1 month ago
Feedback? Help us improve.