DeepSeek-Coder-V2  by deepseek-ai

Open-source code language model comparable to GPT4-Turbo

created 1 year ago
5,967 stars

Top 8.8% on sourcepulse

GitHubView on GitHub
Project Summary

DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) large language model specifically designed for code intelligence tasks. It aims to rival closed-source models like GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding, mathematical reasoning, and general language understanding, supporting an extensive 338 programming languages and a 128K context window.

How It Works

DeepSeek-Coder-V2 is built upon the DeepSeekMoE framework, further pre-trained on 6 trillion tokens. This MoE architecture allows for efficient inference by activating only a subset of parameters (2.4B for Lite, 21B for the full model) while maintaining a large total parameter count (16B for Lite, 236B for the full model). This design balances high performance with manageable computational requirements.

Quick Start & Requirements

  • Inference: Use Huggingface Transformers or optimized frameworks like SGLang or vLLM.
  • Hardware: BF16 inference for the 236B model requires 8x A100 80GB GPUs. Lite models are more accessible.
  • Dependencies: PyTorch, Transformers, SGLang, or vLLM.
  • Resources: Links to Hugging Face model downloads are provided.
  • Docs: How to Use

Highlighted Details

  • Achieves state-of-the-art performance among open-source models on various coding benchmarks (HumanEval, MBPP+, RepoBench) and mathematical reasoning tasks (GSM8K, MATH).
  • Outperforms or matches leading closed-source models on several coding and math benchmarks.
  • Supports a 128K context window, evaluated with Needle In A Haystack tests.
  • Offers both base and instruct-tuned versions, with a "Lite" variant for reduced resource usage.

Maintenance & Community

  • Developed by DeepSeek AI.
  • Contact: service@deepseek.com
  • Issues can be raised on the GitHub repository.

Licensing & Compatibility

  • Code repository: MIT License.
  • Model weights: Subject to a separate Model License.
  • Commercial use: Supported for DeepSeek-Coder-V2 Base/Instruct models.

Limitations & Caveats

  • The full 236B parameter model has significant hardware requirements (8x 80GB GPUs for BF16 inference).
  • Specific chat template formatting is crucial for the 16B-Lite model to avoid issues like incorrect language responses or garbled text.
Health Check
Last commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
321 stars in the last 90 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-V2 by deepseek-ai

0.1%
5k
MoE language model for research/API use
created 1 year ago
updated 10 months ago
Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of Build a Large Language Model From Scratch), and
6 more.

DeepSeek-R1 by deepseek-ai

0.1%
91k
Reasoning models research paper
created 6 months ago
updated 1 month ago
Feedback? Help us improve.