Large language model for research/commercial use
Top 8.0% on sourcepulse
DeepSeek LLM provides open-source access to powerful 7B and 67B parameter language models, trained on 2 trillion tokens in English and Chinese. These models are designed for researchers and developers, offering strong performance in reasoning, coding, mathematics, and Chinese language comprehension, with the 67B Chat model achieving notable results on challenging exams and coding benchmarks.
How It Works
DeepSeek LLM models are based on the LLaMA architecture, utilizing Multi-Head Attention (MHA) for the 7B version and Grouped-Query Attention (GQA) for the 67B version. They are trained with an AdamW optimizer and a multi-step learning rate schedule on a 4096 sequence length. The training data pipeline emphasizes data quality through methods like heuristic rules, model-based filtering, and MinhashLSH for deduplication, while respecting privacy and copyright.
Quick Start & Requirements
pip install -r requirements.txt
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Models may exhibit biases inherited from training data, generate factually incorrect "hallucinations," and sometimes produce repetitive outputs. The README notes that adding multiple-choice training data improved benchmark scores but did not enhance general knowledge performance, leading to its exclusion from pre-training/fine-tuning to avoid overfitting.
1 year ago
1 week