DeepSeek-LLM  by deepseek-ai

Large language model for research/commercial use

Created 1 year ago
6,550 stars

Top 7.8% on SourcePulse

GitHubView on GitHub
Project Summary

DeepSeek LLM provides open-source access to powerful 7B and 67B parameter language models, trained on 2 trillion tokens in English and Chinese. These models are designed for researchers and developers, offering strong performance in reasoning, coding, mathematics, and Chinese language comprehension, with the 67B Chat model achieving notable results on challenging exams and coding benchmarks.

How It Works

DeepSeek LLM models are based on the LLaMA architecture, utilizing Multi-Head Attention (MHA) for the 7B version and Grouped-Query Attention (GQA) for the 67B version. They are trained with an AdamW optimizer and a multi-step learning rate schedule on a 4096 sequence length. The training data pipeline emphasizes data quality through methods like heuristic rules, model-based filtering, and MinhashLSH for deduplication, while respecting privacy and copyright.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Inference can be performed using Huggingface's Transformers or vLLM.
  • Requires Python >= 3.8.
  • For 67B models, significant GPU resources are needed (e.g., 8x A100 40GB for inference).
  • Official Huggingface model pages and vLLM examples are provided.

Highlighted Details

  • DeepSeek LLM 67B Base outperforms Llama2 70B Base on reasoning, coding, math, and Chinese comprehension benchmarks.
  • DeepSeek LLM 67B Chat achieves 73.78 on HumanEval and 84.1 on GSM8K (0-shot).
  • Demonstrates strong performance on the Hungarian National High-School Exam (65 score).
  • Supports commercial use under the specified model license.

Maintenance & Community

Licensing & Compatibility

  • Code repository is MIT licensed.
  • Model usage is subject to a separate Model License.
  • Commercial use is permitted.

Limitations & Caveats

Models may exhibit biases inherited from training data, generate factually incorrect "hallucinations," and sometimes produce repetitive outputs. The README notes that adding multiple-choice training data improved benchmark scores but did not enhance general knowledge performance, leading to its exclusion from pre-training/fine-tuning to avoid overfitting.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
51 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

dots.llm1 by rednote-hilab

0.2%
462
MoE model for research
Created 4 months ago
Updated 4 weeks ago
Starred by Didier Lopes Didier Lopes(Founder of OpenBB), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

DeepSeek-Coder-V2 by deepseek-ai

0.3%
6k
Open-source code language model comparable to GPT4-Turbo
Created 1 year ago
Updated 11 months ago
Starred by Georgi Gerganov Georgi Gerganov(Author of llama.cpp, whisper.cpp), Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), and
13 more.

Qwen3 by QwenLM

0.4%
25k
Large language model series by Qwen team, Alibaba Cloud
Created 1 year ago
Updated 2 weeks ago
Feedback? Help us improve.