profile-data  by deepseek-ai

Profiling data for computation-communication overlap analysis

Created 10 months ago
1,132 stars

Top 33.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides profiling data from DeepSeek's training and inference framework, aimed at researchers and engineers studying communication-computation overlap strategies in large-scale AI models. It offers insights into low-level implementation details and optimization techniques used in their V3/R1 models.

How It Works

The profiling data is captured using the PyTorch Profiler and can be visualized with chrome://tracing. The data demonstrates overlapping strategies for DualPipe training chunks, featuring Mixture of Experts (MoE) layers. Inference profiling includes prefilling and decoding stages, utilizing micro-batches to overlap computation with all-to-all communication, with specific configurations for each stage.

Quick Start & Requirements

  • Download the profile_data archives.
  • Visualize by navigating to chrome://tracing in Chrome or Edge.
  • Requires a modern web browser capable of rendering Chrome Tracing format.

Highlighted Details

  • Training profile simulates balanced MoE routing with EP64, TP1, and 4K sequence length.
  • Inference prefilling profile uses EP32, TP1, 4K prompt length, and 16K tokens/GPU batch size, overlapping computation with all-to-all communication using two micro-batches.
  • Inference decoding profile uses EP128, TP1, 4K prompt length, and 128 requests/GPU batch size, overlapping computation with non-SM-occupying all-to-all communication.

Maintenance & Community

No specific community links or maintenance details are provided in the README.

Licensing & Compatibility

The README does not specify a license.

Limitations & Caveats

The profiling data is based on simulated balanced MoE routing for training and specific configurations for inference, which may not fully represent all real-world scenarios. PP communication is excluded from training profiles for simplicity.

Health Check
Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Coauthor of SGLang) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

llm-analysis by cli99

0%
475
CLI tool for LLM latency/memory analysis during training/inference
Created 2 years ago
Updated 8 months ago
Starred by Wing Lian Wing Lian(Founder of Axolotl AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

recurrent-pretraining by seal-rg

0.2%
858
Pretraining code for depth-recurrent language model research
Created 11 months ago
Updated 1 week ago
Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

VeOmni by ByteDance-Seed

1.8%
2k
Framework for scaling multimodal model training across accelerators
Created 9 months ago
Updated 1 day ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

0.9%
2k
Speculative decoding research paper for faster LLM inference
Created 2 years ago
Updated 3 weeks ago
Feedback? Help us improve.