DeepSeek-Coder-V2  by deepseek-ai

Open-source code language model comparable to GPT4-Turbo

Created 1 year ago
6,082 stars

Top 8.5% on SourcePulse

GitHubView on GitHub
Project Summary

DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) large language model specifically designed for code intelligence tasks. It aims to rival closed-source models like GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding, mathematical reasoning, and general language understanding, supporting an extensive 338 programming languages and a 128K context window.

How It Works

DeepSeek-Coder-V2 is built upon the DeepSeekMoE framework, further pre-trained on 6 trillion tokens. This MoE architecture allows for efficient inference by activating only a subset of parameters (2.4B for Lite, 21B for the full model) while maintaining a large total parameter count (16B for Lite, 236B for the full model). This design balances high performance with manageable computational requirements.

Quick Start & Requirements

  • Inference: Use Huggingface Transformers or optimized frameworks like SGLang or vLLM.
  • Hardware: BF16 inference for the 236B model requires 8x A100 80GB GPUs. Lite models are more accessible.
  • Dependencies: PyTorch, Transformers, SGLang, or vLLM.
  • Resources: Links to Hugging Face model downloads are provided.
  • Docs: How to Use

Highlighted Details

  • Achieves state-of-the-art performance among open-source models on various coding benchmarks (HumanEval, MBPP+, RepoBench) and mathematical reasoning tasks (GSM8K, MATH).
  • Outperforms or matches leading closed-source models on several coding and math benchmarks.
  • Supports a 128K context window, evaluated with Needle In A Haystack tests.
  • Offers both base and instruct-tuned versions, with a "Lite" variant for reduced resource usage.

Maintenance & Community

  • Developed by DeepSeek AI.
  • Contact: service@deepseek.com
  • Issues can be raised on the GitHub repository.

Licensing & Compatibility

  • Code repository: MIT License.
  • Model weights: Subject to a separate Model License.
  • Commercial use: Supported for DeepSeek-Coder-V2 Base/Instruct models.

Limitations & Caveats

  • The full 236B parameter model has significant hardware requirements (8x 80GB GPUs for BF16 inference).
  • Specific chat template formatting is crucial for the 16B-Lite model to avoid issues like incorrect language responses or garbled text.
Health Check
Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
0
Star History
90 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

dots.llm1 by rednote-hilab

0.2%
462
MoE model for research
Created 4 months ago
Updated 4 weeks ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
15 more.

codellama by meta-llama

0.0%
16k
Inference code for CodeLlama models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.