DeepSeek-Coder-V2  by deepseek-ai

Open-source code language model comparable to GPT4-Turbo

Created 1 year ago
6,474 stars

Top 7.8% on SourcePulse

GitHubView on GitHub
Project Summary

DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) large language model specifically designed for code intelligence tasks. It aims to rival closed-source models like GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding, mathematical reasoning, and general language understanding, supporting an extensive 338 programming languages and a 128K context window.

How It Works

DeepSeek-Coder-V2 is built upon the DeepSeekMoE framework, further pre-trained on 6 trillion tokens. This MoE architecture allows for efficient inference by activating only a subset of parameters (2.4B for Lite, 21B for the full model) while maintaining a large total parameter count (16B for Lite, 236B for the full model). This design balances high performance with manageable computational requirements.

Quick Start & Requirements

  • Inference: Use Huggingface Transformers or optimized frameworks like SGLang or vLLM.
  • Hardware: BF16 inference for the 236B model requires 8x A100 80GB GPUs. Lite models are more accessible.
  • Dependencies: PyTorch, Transformers, SGLang, or vLLM.
  • Resources: Links to Hugging Face model downloads are provided.
  • Docs: How to Use

Highlighted Details

  • Achieves state-of-the-art performance among open-source models on various coding benchmarks (HumanEval, MBPP+, RepoBench) and mathematical reasoning tasks (GSM8K, MATH).
  • Outperforms or matches leading closed-source models on several coding and math benchmarks.
  • Supports a 128K context window, evaluated with Needle In A Haystack tests.
  • Offers both base and instruct-tuned versions, with a "Lite" variant for reduced resource usage.

Maintenance & Community

  • Developed by DeepSeek AI.
  • Contact: service@deepseek.com
  • Issues can be raised on the GitHub repository.

Licensing & Compatibility

  • Code repository: MIT License.
  • Model weights: Subject to a separate Model License.
  • Commercial use: Supported for DeepSeek-Coder-V2 Base/Instruct models.

Limitations & Caveats

  • The full 236B parameter model has significant hardware requirements (8x 80GB GPUs for BF16 inference).
  • Specific chat template formatting is crucial for the 16B-Lite model to avoid issues like incorrect language responses or garbled text.
Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
120 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
15 more.

codellama by meta-llama

0.0%
16k
Inference code for CodeLlama models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.