DeepSeek-Coder-V2 by deepseek-ai

Open-source code language model comparable to GPT4-Turbo

Created 1 year ago

6,338 stars

Top 8.1% on SourcePulse

View on GitHub

5 Experts Love This Project

Didier Lopes

Founder of OpenBB

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Pawel Garbacki

Cofounder of Fireworks AI

Jiayi Pan

Author of SWE-Gym; MTS at xAI

and 1 more!

Project Summary

DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) large language model specifically designed for code intelligence tasks. It aims to rival closed-source models like GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding, mathematical reasoning, and general language understanding, supporting an extensive 338 programming languages and a 128K context window.

How It Works

DeepSeek-Coder-V2 is built upon the DeepSeekMoE framework, further pre-trained on 6 trillion tokens. This MoE architecture allows for efficient inference by activating only a subset of parameters (2.4B for Lite, 21B for the full model) while maintaining a large total parameter count (16B for Lite, 236B for the full model). This design balances high performance with manageable computational requirements.

Quick Start & Requirements

Inference: Use Huggingface Transformers or optimized frameworks like SGLang or vLLM.
Hardware: BF16 inference for the 236B model requires 8x A100 80GB GPUs. Lite models are more accessible.
Dependencies: PyTorch, Transformers, SGLang, or vLLM.
Resources: Links to Hugging Face model downloads are provided.
Docs: How to Use

Highlighted Details

Achieves state-of-the-art performance among open-source models on various coding benchmarks (HumanEval, MBPP+, RepoBench) and mathematical reasoning tasks (GSM8K, MATH).
Outperforms or matches leading closed-source models on several coding and math benchmarks.
Supports a 128K context window, evaluated with Needle In A Haystack tests.
Offers both base and instruct-tuned versions, with a "Lite" variant for reduced resource usage.

Maintenance & Community

Developed by DeepSeek AI.
Contact: service@deepseek.com
Issues can be raised on the GitHub repository.

Licensing & Compatibility

Code repository: MIT License.
Model weights: Subject to a separate Model License.
Commercial use: Supported for DeepSeek-Coder-V2 Base/Instruct models.

Limitations & Caveats

The full 236B parameter model has significant hardware requirements (8x 80GB GPUs for BF16 inference).
Specific chat template formatting is crucial for the 16B-Lite model to avoid issues like incorrect language responses or garbled text.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

63 stars in the last 30 days