DeepSeek-MoE  by deepseek-ai

MoE language model for research purposes

Created 1 year ago
1,796 stars

Top 23.9% on SourcePulse

GitHubView on GitHub
Project Summary

DeepSeekMoE 16B is a Mixture-of-Experts (MoE) language model designed for efficient inference and comparable performance to larger dense models. It targets researchers and developers seeking high-quality language models with reduced computational requirements, offering both base and chat variants.

How It Works

The model utilizes an innovative MoE architecture featuring fine-grained expert segmentation and shared experts isolation. This approach allows for significantly fewer active parameters during inference, resulting in approximately 40% of the computations compared to dense models of similar capability, such as LLaMA2 7B.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python >= 3.8, PyTorch (bfloat16 recommended), Huggingface Transformers.
  • Inference: Can be run on a single GPU with 40GB memory without quantization.
  • Resources: Fine-tuning requires DeepSpeed and potentially multiple A100 GPUs (8x A100 40GB for full fine-tuning, 1x A100 80G for QLoRA).
  • Docs: Model Download, Quick Start, Evaluation Results

Highlighted Details

  • Achieves comparable performance to LLaMA2 7B with ~40% of computations.
  • Outperforms models with similar activated parameters by a large margin.
  • Supports fine-tuning with DeepSpeed and QLoRA (4/8-bit).
  • Trained on 2T English and Chinese tokens.

Maintenance & Community

Licensing & Compatibility

  • License: MIT License for code, separate Model License for model weights.
  • Commercial Use: Permitted under the terms of the Model License.

Limitations & Caveats

  • Fine-tuning scripts recommend specific hardware configurations (e.g., multiple A100 GPUs).
  • The README advises against using system prompts for chat completion with current models.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
29 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

xTuring by stochasticai

0.0%
3k
SDK for fine-tuning and customizing open-source LLMs
Created 2 years ago
Updated 1 day ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Hanlin Tang Hanlin Tang(CTO Neural Networks at Databricks; Cofounder of MosaicML), and
5 more.

dbrx by databricks

0%
3k
Large language model for research/commercial use
Created 1 year ago
Updated 1 year ago
Feedback? Help us improve.