DeepSeek-MoE by deepseek-ai

MoE language model for research purposes

Created 2 years ago

1,870 stars

Top 23.0% on SourcePulse

View on GitHub

4 Experts Love This Project

Elvis Saravia

Founder of DAIR.AI

Yaowei Zheng

Author of LLaMA-Factory

Maxime Labonne

Head of Post-Training at Liquid AI

Binyuan Hui

Research Scientist at Alibaba Qwen

Project Summary

DeepSeekMoE 16B is a Mixture-of-Experts (MoE) language model designed for efficient inference and comparable performance to larger dense models. It targets researchers and developers seeking high-quality language models with reduced computational requirements, offering both base and chat variants.

How It Works

The model utilizes an innovative MoE architecture featuring fine-grained expert segmentation and shared experts isolation. This approach allows for significantly fewer active parameters during inference, resulting in approximately 40% of the computations compared to dense models of similar capability, such as LLaMA2 7B.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python >= 3.8, PyTorch (bfloat16 recommended), Huggingface Transformers.
Inference: Can be run on a single GPU with 40GB memory without quantization.
Resources: Fine-tuning requires DeepSpeed and potentially multiple A100 GPUs (8x A100 40GB for full fine-tuning, 1x A100 80G for QLoRA).
Docs: Model Download, Quick Start, Evaluation Results

Highlighted Details

Achieves comparable performance to LLaMA2 7B with ~40% of computations.
Outperforms models with similar activated parameters by a large margin.
Supports fine-tuning with DeepSpeed and QLoRA (4/8-bit).
Trained on 2T English and Chinese tokens.

Maintenance & Community

Contact: service@deepseek.com
Issues can be raised on the GitHub repository.

Licensing & Compatibility

License: MIT License for code, separate Model License for model weights.
Commercial Use: Permitted under the terms of the Model License.

Limitations & Caveats

Fine-tuning scripts recommend specific hardware configurations (e.g., multiple A100 GPUs).
The README advises against using system prompts for chat completion with current models.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

22 stars in the last 30 days