Profiling data for computation-communication overlap analysis
Top 35.7% on sourcepulse
This repository provides profiling data from DeepSeek's training and inference framework, aimed at researchers and engineers studying communication-computation overlap strategies in large-scale AI models. It offers insights into low-level implementation details and optimization techniques used in their V3/R1 models.
How It Works
The profiling data is captured using the PyTorch Profiler and can be visualized with chrome://tracing
. The data demonstrates overlapping strategies for DualPipe training chunks, featuring Mixture of Experts (MoE) layers. Inference profiling includes prefilling and decoding stages, utilizing micro-batches to overlap computation with all-to-all communication, with specific configurations for each stage.
Quick Start & Requirements
profile_data
archives.chrome://tracing
in Chrome or Edge.Highlighted Details
Maintenance & Community
No specific community links or maintenance details are provided in the README.
Licensing & Compatibility
The README does not specify a license.
Limitations & Caveats
The profiling data is based on simulated balanced MoE routing for training and specific configurations for inference, which may not fully represent all real-world scenarios. PP communication is excluded from training profiles for simplicity.
4 months ago
1 week