DeepSeek-LLM  by deepseek-ai

Large language model for research/commercial use

created 1 year ago
6,486 stars

Top 8.0% on sourcepulse

GitHubView on GitHub
Project Summary

DeepSeek LLM provides open-source access to powerful 7B and 67B parameter language models, trained on 2 trillion tokens in English and Chinese. These models are designed for researchers and developers, offering strong performance in reasoning, coding, mathematics, and Chinese language comprehension, with the 67B Chat model achieving notable results on challenging exams and coding benchmarks.

How It Works

DeepSeek LLM models are based on the LLaMA architecture, utilizing Multi-Head Attention (MHA) for the 7B version and Grouped-Query Attention (GQA) for the 67B version. They are trained with an AdamW optimizer and a multi-step learning rate schedule on a 4096 sequence length. The training data pipeline emphasizes data quality through methods like heuristic rules, model-based filtering, and MinhashLSH for deduplication, while respecting privacy and copyright.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Inference can be performed using Huggingface's Transformers or vLLM.
  • Requires Python >= 3.8.
  • For 67B models, significant GPU resources are needed (e.g., 8x A100 40GB for inference).
  • Official Huggingface model pages and vLLM examples are provided.

Highlighted Details

  • DeepSeek LLM 67B Base outperforms Llama2 70B Base on reasoning, coding, math, and Chinese comprehension benchmarks.
  • DeepSeek LLM 67B Chat achieves 73.78 on HumanEval and 84.1 on GSM8K (0-shot).
  • Demonstrates strong performance on the Hungarian National High-School Exam (65 score).
  • Supports commercial use under the specified model license.

Maintenance & Community

Licensing & Compatibility

  • Code repository is MIT licensed.
  • Model usage is subject to a separate Model License.
  • Commercial use is permitted.

Limitations & Caveats

Models may exhibit biases inherited from training data, generate factually incorrect "hallucinations," and sometimes produce repetitive outputs. The README notes that adding multiple-choice training data improved benchmark scores but did not enhance general knowledge performance, leading to its exclusion from pre-training/fine-tuning to avoid overfitting.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
4
Star History
190 stars in the last 90 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-V2 by deepseek-ai

0.1%
5k
MoE language model for research/API use
created 1 year ago
updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of Build a Large Language Model From Scratch), and
6 more.

DeepSeek-R1 by deepseek-ai

0.1%
91k
Reasoning models research paper
created 6 months ago
updated 1 month ago
Feedback? Help us improve.