DeepSeek-LLM by deepseek-ai

Large language model for research/commercial use

Created 2 years ago

6,688 stars

Top 7.6% on SourcePulse

View on GitHub

3 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Yaowei Zheng

Author of LLaMA-Factory

Vincent Weisser

Cofounder of Prime Intellect

Project Summary

DeepSeek LLM provides open-source access to powerful 7B and 67B parameter language models, trained on 2 trillion tokens in English and Chinese. These models are designed for researchers and developers, offering strong performance in reasoning, coding, mathematics, and Chinese language comprehension, with the 67B Chat model achieving notable results on challenging exams and coding benchmarks.

How It Works

DeepSeek LLM models are based on the LLaMA architecture, utilizing Multi-Head Attention (MHA) for the 7B version and Grouped-Query Attention (GQA) for the 67B version. They are trained with an AdamW optimizer and a multi-step learning rate schedule on a 4096 sequence length. The training data pipeline emphasizes data quality through methods like heuristic rules, model-based filtering, and MinhashLSH for deduplication, while respecting privacy and copyright.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Inference can be performed using Huggingface's Transformers or vLLM.
Requires Python >= 3.8.
For 67B models, significant GPU resources are needed (e.g., 8x A100 40GB for inference).
Official Huggingface model pages and vLLM examples are provided.

Highlighted Details

DeepSeek LLM 67B Base outperforms Llama2 70B Base on reasoning, coding, math, and Chinese comprehension benchmarks.
DeepSeek LLM 67B Chat achieves 73.78 on HumanEval and 84.1 on GSM8K (0-shot).
Demonstrates strong performance on the Hungarian National High-School Exam (65 score).
Supports commercial use under the specified model license.

Maintenance & Community

Developed by DeepSeek-AI.
Contact available via GitHub issues or service@deepseek.com.

Licensing & Compatibility

Code repository is MIT licensed.
Model usage is subject to a separate Model License.
Commercial use is permitted.

Limitations & Caveats

Models may exhibit biases inherited from training data, generate factually incorrect "hallucinations," and sometimes produce repetitive outputs. The README notes that adding multiple-choice training data improved benchmark scores but did not enhance general knowledge performance, leading to its exclusion from pre-training/fine-tuning to avoid overfitting.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

40 stars in the last 30 days