profile-data by deepseek-ai

Profiling data for computation-communication overlap analysis

Created 10 months ago

1,132 stars

Top 33.9% on SourcePulse

1 Expert Loves This Project

zhyncs

Inference Lead at SGLang; Research Scientist at Together AI

Project Summary

This repository provides profiling data from DeepSeek's training and inference framework, aimed at researchers and engineers studying communication-computation overlap strategies in large-scale AI models. It offers insights into low-level implementation details and optimization techniques used in their V3/R1 models.

How It Works

The profiling data is captured using the PyTorch Profiler and can be visualized with chrome://tracing. The data demonstrates overlapping strategies for DualPipe training chunks, featuring Mixture of Experts (MoE) layers. Inference profiling includes prefilling and decoding stages, utilizing micro-batches to overlap computation with all-to-all communication, with specific configurations for each stage.

Quick Start & Requirements

Download the profile_data archives.
Visualize by navigating to chrome://tracing in Chrome or Edge.
Requires a modern web browser capable of rendering Chrome Tracing format.

Highlighted Details

Training profile simulates balanced MoE routing with EP64, TP1, and 4K sequence length.
Inference prefilling profile uses EP32, TP1, 4K prompt length, and 16K tokens/GPU batch size, overlapping computation with all-to-all communication using two micro-batches.
Inference decoding profile uses EP128, TP1, 4K prompt length, and 128 requests/GPU batch size, overlapping computation with non-SM-occupying all-to-all communication.

Maintenance & Community

No specific community links or maintenance details are provided in the README.

Licensing & Compatibility

The README does not specify a license.

Limitations & Caveats

The profiling data is based on simulated balanced MoE routing for training and specific configurations for inference, which may not fully represent all real-world scenarios. PP communication is excluded from training profiles for simplicity.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

8 stars in the last 30 days

Explore Similar Projects

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

1 more.

varuna by microsoft

Tool for efficient large DNN model training on commodity hardware

Created 4 years ago

Updated 1 year ago

awesomeMLSys by gpu-mode

ML systems onboarding reading list

Created 1 year ago

Updated 11 months ago

Starred by

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen),

Casper Hansen

Casper Hansen(Author of AutoAWQ), and

4 more.

veScale by volcengine

PyTorch-native framework for LLM training

Created 1 year ago

Updated 1 month ago

Starred by

Ying Sheng

Ying Sheng(Coauthor of SGLang) and

Stas Bekman

Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

llm-analysis by cli99

CLI tool for LLM latency/memory analysis during training/inference

Created 2 years ago

Updated 8 months ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

2 more.

recurrent-pretraining by seal-rg

Pretraining code for depth-recurrent language model research

Created 11 months ago

Updated 1 week ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

7 more.

grokking-pytorch by Kaixhin

PyTorch guide with notes on usage, best practices, and debugging

Created 7 years ago

Updated 4 years ago

Starred by

Luca Antiga

Luca Antiga(CTO of Lightning AI),

William Falcon

William Falcon(Founder of Lightning AI), and

4 more.

lightning-thunder by Lightning-AI

PyTorch compiler for model optimization via source-to-source transformation

Created 1 year ago

Updated 1 day ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Yineng Zhang

Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and

1 more.

VeOmni by ByteDance-Seed

Framework for scaling multimodal model training across accelerators

Created 9 months ago

Updated 1 day ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA),

Yineng Zhang

Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and

8 more.

EAGLE by SafeAILab

Speculative decoding research paper for faster LLM inference

Created 2 years ago

Updated 3 weeks ago

Starred by

Clement Delangue

Clement Delangue(Cofounder of Hugging Face),

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and

14 more.

nanotron by huggingface

Minimalistic library for large language model pretraining

Created 2 years ago

Updated 1 month ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI),

Johannes Hagemann

Johannes Hagemann(Cofounder of Prime Intellect), and

1 more.

DualPipe by deepseek-ai

Pipeline parallelism algorithm for training large models

Created 10 months ago

Updated 10 months ago

Starred by

Ying Sheng

Ying Sheng(Coauthor of SGLang).

Awesome-LLM-Inference by xlite-dev

Curated list of LLM/VLM inference research papers with code

Created 2 years ago

Updated 1 month ago

Feedback? Help us improve.