DeepSeek-VL2  by deepseek-ai

MoE vision-language model for multimodal understanding

created 7 months ago
4,995 stars

Top 10.1% on sourcepulse

GitHubView on GitHub
Project Summary

DeepSeek-VL2 is a series of Mixture-of-Experts (MoE) vision-language models designed for advanced multimodal understanding tasks like visual question answering, OCR, and document analysis. Targeting researchers and developers, it offers competitive performance with efficient parameter activation, providing three variants: Tiny (1B), Small (2.8B), and the full (4.5B) model.

How It Works

DeepSeek-VL2 leverages a Mixture-of-Experts architecture, activating a subset of parameters for each inference pass. This approach allows for a larger total parameter count while maintaining computational efficiency and faster inference compared to dense models of similar performance. The models support multimodal inputs, including multiple images and object localization via special tokens.

Quick Start & Requirements

  • Install dependencies: pip install -e .
  • Requires Python >= 3.8.
  • Inference examples suggest 80GB GPU memory for deepseek-vl2-small and larger for deepseek-vl2.
  • Incremental prefilling can reduce memory requirements for deepseek-vl2-small to 40GB.
  • Official demos and inference scripts are available.

Highlighted Details

  • Three model variants: Tiny (1B), Small (2.8B), and base (4.5B) activated parameters.
  • Supports multimodal inputs, including multiple images and object localization with bounding box output.
  • Achieves competitive or state-of-the-art performance with efficient MoE activation.
  • Offers incremental prefilling for reduced memory usage during inference.

Maintenance & Community

The project was released in December 2024. Contact is available via GitHub issues or service@deepseek.com.

Licensing & Compatibility

The code repository is licensed under the MIT License. Model usage is subject to the DeepSeek Model License, which permits commercial use.

Limitations & Caveats

The provided Gradio demo is a basic implementation and may exhibit slower performance; production environments should consider optimized deployment solutions like vLLM or vLLM. The README notes that larger models require significant GPU memory (80GB+).

Health Check
Last commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
2
Star History
258 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Wei-Lin Chiang Wei-Lin Chiang(Cofounder of LMArena), and
1 more.

deepsparse by neuralmagic

0%
3k
CPU inference runtime for sparse deep learning models
created 4 years ago
updated 2 months ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-V2 by deepseek-ai

0.1%
5k
MoE language model for research/API use
created 1 year ago
updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of Build a Large Language Model From Scratch), and
6 more.

DeepSeek-R1 by deepseek-ai

0.1%
91k
Reasoning models research paper
created 6 months ago
updated 1 month ago
Feedback? Help us improve.