DeepSeek-V3.2-Exp  by deepseek-ai

Experimental LLM boosting long-context efficiency

Created 2 weeks ago

New!

889 stars

Top 40.7% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

DeepSeek-V3.2-Exp is an experimental large language model release focused on enhancing long-context processing efficiency. It targets researchers and power users seeking to leverage transformer models on extended text sequences without compromising output quality. The primary benefit is significant improvements in training and inference speed for long contexts through a novel sparse attention mechanism.

How It Works

This model introduces DeepSeek Sparse Attention (DSA), a novel mechanism enabling fine-grained sparse attention for the first time. DSA is designed to optimize computational efficiency during both training and inference of long-context scenarios. By exploring and validating optimizations within transformer architectures, DSA aims to reduce the computational overhead associated with processing extended text sequences, offering substantial gains while maintaining model performance.

Quick Start & Requirements

  • HuggingFace: Convert model weights using convert.py and run inference via generate.py in the inference folder. Requires setting MP (model parallel) based on GPU count.
  • SGLang: Docker images are available for H200, MI350, and NPUs (a2, a3). Launch command: python -m sglang.launch_server --model deepseek-ai/DeepSeek-V3.2-Exp --tp 8 --dp 8 --page-size 64.
  • vLLM: Offers day-0 support; refer to vLLM recipes for details.
  • Prerequisites: GPU hardware is implicitly required for all deployment methods. CUDA versions are not explicitly stated but are typical for deep learning frameworks.

Highlighted Details

  • Achieves performance on par with its predecessor, DeepSeek-V3.1-Terminus, across various public benchmarks.
  • Demonstrates substantial improvements in long-context training and inference efficiency.
  • Maintains virtually identical model output quality compared to V3.1-Terminus.
  • Includes open-source kernels for TileLang (readability), DeepGEMM (indexer logit kernels), and FlashMLA (sparse attention kernels).

Maintenance & Community

Contact is available via email at service@deepseek.com or by raising an issue on the repository. No specific community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

The model and repository are licensed under the MIT License. This permissive license generally allows for commercial use and integration into closed-source projects.

Limitations & Caveats

This is an experimental release ("Exp") and an intermediate step towards next-generation architectures. While performance is comparable to V3.1-Terminus on benchmarks, its experimental nature suggests potential for further iteration or unforeseen issues. Specific performance gains for long-context scenarios are claimed but not quantified with detailed benchmarks in the provided text.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
10
Issues (30d)
22
Star History
894 stars in the last 16 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
4 more.

LongLoRA by dvlab-research

0.0%
3k
LongLoRA: Efficient fine-tuning for long-context LLMs
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.