DeepSeek-MoE  by deepseek-ai

MoE language model for research purposes

created 1 year ago
1,757 stars

Top 25.0% on sourcepulse

GitHubView on GitHub
Project Summary

DeepSeekMoE 16B is a Mixture-of-Experts (MoE) language model designed for efficient inference and comparable performance to larger dense models. It targets researchers and developers seeking high-quality language models with reduced computational requirements, offering both base and chat variants.

How It Works

The model utilizes an innovative MoE architecture featuring fine-grained expert segmentation and shared experts isolation. This approach allows for significantly fewer active parameters during inference, resulting in approximately 40% of the computations compared to dense models of similar capability, such as LLaMA2 7B.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python >= 3.8, PyTorch (bfloat16 recommended), Huggingface Transformers.
  • Inference: Can be run on a single GPU with 40GB memory without quantization.
  • Resources: Fine-tuning requires DeepSpeed and potentially multiple A100 GPUs (8x A100 40GB for full fine-tuning, 1x A100 80G for QLoRA).
  • Docs: Model Download, Quick Start, Evaluation Results

Highlighted Details

  • Achieves comparable performance to LLaMA2 7B with ~40% of computations.
  • Outperforms models with similar activated parameters by a large margin.
  • Supports fine-tuning with DeepSpeed and QLoRA (4/8-bit).
  • Trained on 2T English and Chinese tokens.

Maintenance & Community

Licensing & Compatibility

  • License: MIT License for code, separate Model License for model weights.
  • Commercial Use: Permitted under the terms of the Model License.

Limitations & Caveats

  • Fine-tuning scripts recommend specific hardware configurations (e.g., multiple A100 GPUs).
  • The README advises against using system prompts for chat completion with current models.
Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
0
Star History
96 stars in the last 90 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-V2 by deepseek-ai

0.1%
5k
MoE language model for research/API use
created 1 year ago
updated 10 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
10 more.

TinyLlama by jzhang38

0.3%
9k
Tiny pretraining project for a 1.1B Llama model
created 1 year ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
2 more.

ChatGLM-6B by zai-org

0.1%
41k
Bilingual dialogue language model for research
created 2 years ago
updated 1 year ago
Feedback? Help us improve.