DeepSeek-V2  by deepseek-ai

MoE language model for research/API use

Created 1 year ago
4,954 stars

Top 10.1% on SourcePulse

GitHubView on GitHub
Project Summary

DeepSeek-V2 is a powerful Mixture-of-Experts (MoE) language model designed for strong performance, economical training, and efficient inference. It targets researchers and developers needing high-quality language generation and coding capabilities, offering significant improvements in speed and cost reduction over previous models.

How It Works

DeepSeek-V2 employs two key architectural innovations: Multi-head Latent Attention (MLA) for efficient inference by compressing key-value pairs, and the DeepSeekMoE architecture for FFNs, enabling cost-effective training of larger models. This combination allows for a 236B total parameter model with only 21B activated per token, drastically reducing KV cache requirements and boosting throughput.

Quick Start & Requirements

  • Hugging Face Transformers: Requires 8x75GB GPUs for BF16 inference.
    from transformers import AutoTokenizer, AutoModelForCausalLM
    model_name = "deepseek-ai/DeepSeek-V2"
    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
    max_memory = {i: "75GB" for i in range(8)}
    model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="sequential", torch_dtype=torch.bfloat16, max_memory=max_memory, attn_implementation="eager")
    
  • SGLang (Recommended): Supports MLA, FP8, and Torch Compile for optimal performance.
    python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V2-Chat --tp 8 --trust-remote-code
    
  • vLLM (Recommended): Requires merging a specific PR.
    from vllm import LLM, SamplingParams
    llm = LLM(model="deepseek-ai/DeepSeek-V2-Chat", tensor_parallel_size=8, max_model_len=8192, trust_remote_code=True)
    
  • Dependencies: Python, PyTorch, Transformers, vLLM, or SGLang.

Highlighted Details

  • 236B total parameters, 21B activated per token.
  • Supports up to 128k context length.
  • Achieves state-of-the-art performance on Chinese benchmarks (C-Eval, CMMLU) and competitive results on English and coding tasks.
  • Offers a 16B parameter "Lite" version for reduced resource requirements.

Maintenance & Community

  • Developed by DeepSeek-AI.
  • Official chat available at chat.deepseek.com.
  • OpenAI-Compatible API available at platform.deepseek.com.

Licensing & Compatibility

  • Code licensed under MIT.
  • Model usage subject to Model License.
  • Supports commercial use for DeepSeek-V2 Base/Chat models.

Limitations & Caveats

The Hugging Face implementation may exhibit slower performance compared to the internal codebase. For optimal inference, using SGLang or vLLM with specific configurations is recommended.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
22 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

dots.llm1 by rednote-hilab

0%
466
MoE model for research
Created 5 months ago
Updated 2 months ago
Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

yarn by jquesnelle

0.4%
2k
Context window extension method for LLMs (research paper, models)
Created 2 years ago
Updated 1 year ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Hanlin Tang Hanlin Tang(CTO Neural Networks at Databricks; Cofounder of MosaicML), and
5 more.

dbrx by databricks

0.0%
3k
Large language model for research/commercial use
Created 1 year ago
Updated 1 year ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Yiran Wu Yiran Wu(Coauthor of AutoGen), and
25 more.

grok-1 by xai-org

0.1%
51k
JAX example code for loading and running Grok-1 open-weights model
Created 1 year ago
Updated 1 year ago
Feedback? Help us improve.