DNN performance profiling toolkit
Top 92.8% on sourcepulse
This repository provides a toolkit for benchmarking and profiling the performance of various deep learning frameworks, including OneFlow, TensorFlow, PyTorch, MXNet, PaddlePaddle, and MindSpore. It aims to offer reproducible, state-of-the-art DNN model implementations optimized for NVIDIA GPU clusters, enabling users to compare training speeds and resource utilization across frameworks.
How It Works
DLPerf evaluates frameworks by training standard DNN models like ResNet-50 and BERT-Base. Benchmarks are conducted across multi-node, multi-device configurations (1-32 GPUs), varying batch sizes, and with/without optimizations like Automatic Mixed Precision (AMP) and XLA. Performance is measured by throughput (samples/second) and latency, with median values reported after ignoring initial training steps to ensure stability.
Quick Start & Requirements
OneFlow/
, PyTorch/
).Highlighted Details
Maintenance & Community
The project appears to be actively maintained by Oneflow-Inc. Specific community channels or active contributors beyond the primary organization are not detailed in the README.
Licensing & Compatibility
The repository's licensing is not explicitly stated in the README. However, it references NVIDIA DeepLearningExamples, which are typically under permissive licenses (e.g., Apache 2.0), and framework-specific repositories which have their own licenses. Compatibility for commercial use would depend on the licenses of the underlying frameworks and example code.
Limitations & Caveats
The benchmark results are specific to the tested hardware (NVIDIA V100 GPUs) and configurations. Some frameworks have limitations noted, such as PyTorch's lack of native AMP support in the tested examples and PaddlePaddle's OOM issues with specific batch sizes and DALI integration. Reproducing exact results may require precise environment replication.
3 years ago
1 day