DeepSeek-VL  by deepseek-ai

Vision-language model for real-world applications (research paper)

created 1 year ago
3,937 stars

Top 12.6% on sourcepulse

GitHubView on GitHub
Project Summary

DeepSeek-VL is an open-source vision-language model designed for real-world multimodal understanding tasks. It supports diverse inputs like logical diagrams, web pages, scientific literature, and natural images, targeting researchers and developers building advanced AI applications. The models offer general multimodal capabilities, enabling complex reasoning and interaction with visual and textual data.

How It Works

DeepSeek-VL integrates a vision encoder with a large language model (LLM) to achieve multimodal understanding. It processes images and text through a unified architecture, allowing for tasks like image description, visual question answering, and multi-image reasoning. The models are available in 7B and 1.3B parameter sizes, with both base and chat variants, supporting a sequence length of 4096 tokens.

Quick Start & Requirements

  • Install via pip: pip install -e .
  • Requires Python >= 3.8.
  • Inference requires a CUDA-enabled GPU and torch.bfloat16 support.
  • Official Hugging Face demo available: https://huggingface.co/spaces/deepseek-ai/DeepSeek-VL-7B
  • Gradio demo can be run with: pip install -e .[gradio] and python deepseek_vl/serve/app_deepseek.py

Highlighted Details

  • Supports multiple images in a single conversation for in-context learning.
  • Offers both base and chat fine-tuned models.
  • Models available in 1.3B and 7B parameter sizes.
  • 4096 token sequence length.

Maintenance & Community

Licensing & Compatibility

  • Code repository licensed under MIT.
  • Model usage subject to DeepSeek Model License.
  • Supports commercial use for both Base and Chat models.

Limitations & Caveats

The provided README does not detail specific performance benchmarks or known limitations of the models. Inference requires a GPU and specific PyTorch data types (bfloat16).

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
145 stars in the last 90 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-V2 by deepseek-ai

0.1%
5k
MoE language model for research/API use
created 1 year ago
updated 10 months ago
Starred by Matei Zaharia Matei Zaharia(Cofounder of Databricks), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

LWM by LargeWorldModel

0.0%
7k
Multimodal autoregressive model for long-context video/text
created 1 year ago
updated 9 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of Build a Large Language Model From Scratch), and
6 more.

DeepSeek-R1 by deepseek-ai

0.1%
91k
Reasoning models research paper
created 6 months ago
updated 1 month ago
Feedback? Help us improve.