DeepSeek-V3.2-Exp by deepseek-ai

Experimental LLM boosting long-context efficiency

Created 2 months ago

1,090 stars

Top 34.8% on SourcePulse

View on GitHub

2 Experts Love This Project

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Lianmin Zheng

Coauthor of SGLang, vLLM

Project Summary

Summary

DeepSeek-V3.2-Exp is an experimental large language model release focused on enhancing long-context processing efficiency. It targets researchers and power users seeking to leverage transformer models on extended text sequences without compromising output quality. The primary benefit is significant improvements in training and inference speed for long contexts through a novel sparse attention mechanism.

How It Works

This model introduces DeepSeek Sparse Attention (DSA), a novel mechanism enabling fine-grained sparse attention for the first time. DSA is designed to optimize computational efficiency during both training and inference of long-context scenarios. By exploring and validating optimizations within transformer architectures, DSA aims to reduce the computational overhead associated with processing extended text sequences, offering substantial gains while maintaining model performance.

Quick Start & Requirements

HuggingFace: Convert model weights using convert.py and run inference via generate.py in the inference folder. Requires setting MP (model parallel) based on GPU count.
SGLang: Docker images are available for H200, MI350, and NPUs (a2, a3). Launch command: python -m sglang.launch_server --model deepseek-ai/DeepSeek-V3.2-Exp --tp 8 --dp 8 --page-size 64.
vLLM: Offers day-0 support; refer to vLLM recipes for details.
Prerequisites: GPU hardware is implicitly required for all deployment methods. CUDA versions are not explicitly stated but are typical for deep learning frameworks.

Highlighted Details

Achieves performance on par with its predecessor, DeepSeek-V3.1-Terminus, across various public benchmarks.
Demonstrates substantial improvements in long-context training and inference efficiency.
Maintains virtually identical model output quality compared to V3.1-Terminus.
Includes open-source kernels for TileLang (readability), DeepGEMM (indexer logit kernels), and FlashMLA (sparse attention kernels).

Maintenance & Community

Contact is available via email at service@deepseek.com or by raising an issue on the repository. No specific community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

The model and repository are licensed under the MIT License. This permissive license generally allows for commercial use and integration into closed-source projects.

Limitations & Caveats

This is an experimental release ("Exp") and an intermediate step towards next-generation architectures. While performance is comparable to V3.1-Terminus on benchmarks, its experimental nature suggests potential for further iteration or unforeseen issues. Specific performance gains for long-context scenarios are claimed but not quantified with detailed benchmarks in the provided text.

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

150 stars in the last 30 days