profile-data  by deepseek-ai

Profiling data for computation-communication overlap analysis

created 5 months ago
1,082 stars

Top 35.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides profiling data from DeepSeek's training and inference framework, aimed at researchers and engineers studying communication-computation overlap strategies in large-scale AI models. It offers insights into low-level implementation details and optimization techniques used in their V3/R1 models.

How It Works

The profiling data is captured using the PyTorch Profiler and can be visualized with chrome://tracing. The data demonstrates overlapping strategies for DualPipe training chunks, featuring Mixture of Experts (MoE) layers. Inference profiling includes prefilling and decoding stages, utilizing micro-batches to overlap computation with all-to-all communication, with specific configurations for each stage.

Quick Start & Requirements

  • Download the profile_data archives.
  • Visualize by navigating to chrome://tracing in Chrome or Edge.
  • Requires a modern web browser capable of rendering Chrome Tracing format.

Highlighted Details

  • Training profile simulates balanced MoE routing with EP64, TP1, and 4K sequence length.
  • Inference prefilling profile uses EP32, TP1, 4K prompt length, and 16K tokens/GPU batch size, overlapping computation with all-to-all communication using two micro-batches.
  • Inference decoding profile uses EP128, TP1, 4K prompt length, and 128 requests/GPU batch size, overlapping computation with non-SM-occupying all-to-all communication.

Maintenance & Community

No specific community links or maintenance details are provided in the README.

Licensing & Compatibility

The README does not specify a license.

Limitations & Caveats

The profiling data is based on simulated balanced MoE routing for training and specific configurations for inference, which may not fully represent all real-world scenarios. PP communication is excluded from training profiles for simplicity.

Health Check
Last commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
77 stars in the last 90 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Author of SGLang) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

llm-analysis by cli99

0%
441
CLI tool for LLM latency/memory analysis during training/inference
created 2 years ago
updated 3 months ago
Feedback? Help us improve.