DLPerf by Oneflow-Inc

DNN performance profiling toolkit

Created 5 years ago

294 stars

Top 90.1% on SourcePulse

Project Summary

This repository provides a toolkit for benchmarking and profiling the performance of various deep learning frameworks, including OneFlow, TensorFlow, PyTorch, MXNet, PaddlePaddle, and MindSpore. It aims to offer reproducible, state-of-the-art DNN model implementations optimized for NVIDIA GPU clusters, enabling users to compare training speeds and resource utilization across frameworks.

How It Works

DLPerf evaluates frameworks by training standard DNN models like ResNet-50 and BERT-Base. Benchmarks are conducted across multi-node, multi-device configurations (1-32 GPUs), varying batch sizes, and with/without optimizations like Automatic Mixed Precision (AMP) and XLA. Performance is measured by throughput (samples/second) and latency, with median values reported after ignoring initial training steps to ensure stability.

Quick Start & Requirements

Install: Clone the repository and follow framework-specific setup instructions within subdirectories (e.g., OneFlow/, PyTorch/).
Prerequisites: NVIDIA GPUs, CUDA, and specific framework installations (e.g., OneFlow, TensorFlow 1.x/2.x, PyTorch, MXNet, PaddlePaddle, MindSpore). Multi-node setups require distributed training configurations.
Resources: Benchmarking requires significant GPU resources (e.g., 4 nodes with 8x V100 GPUs each) and substantial training time.
Links: NVIDIA DeepLearningExamples, OneFlow-Benchmark

Highlighted Details

Comprehensive comparison of 7+ major DL frameworks on ResNet-50 and BERT-Base.
Detailed performance metrics including FP32 and AMP throughput, speedups, and memory usage.
Benchmarks for specialized models like Wide & Deep Learning (W&D) and InsightFace, highlighting memory efficiency.
GPT model performance comparison against Megatron-LM, showing OneFlow's advantages.

Maintenance & Community

The project appears to be actively maintained by Oneflow-Inc. Specific community channels or active contributors beyond the primary organization are not detailed in the README.

Licensing & Compatibility

The repository's licensing is not explicitly stated in the README. However, it references NVIDIA DeepLearningExamples, which are typically under permissive licenses (e.g., Apache 2.0), and framework-specific repositories which have their own licenses. Compatibility for commercial use would depend on the licenses of the underlying frameworks and example code.

Limitations & Caveats

The benchmark results are specific to the tested hardware (NVIDIA V100 GPUs) and configurations. Some frameworks have limitations noted, such as PyTorch's lack of native AMP support in the tested examples and PaddlePaddle's OOM issues with specific batch sizes and DALI integration. Reproducing exact results may require precise environment replication.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days