DLPerf  by Oneflow-Inc

DNN performance profiling toolkit

created 5 years ago
285 stars

Top 92.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a toolkit for benchmarking and profiling the performance of various deep learning frameworks, including OneFlow, TensorFlow, PyTorch, MXNet, PaddlePaddle, and MindSpore. It aims to offer reproducible, state-of-the-art DNN model implementations optimized for NVIDIA GPU clusters, enabling users to compare training speeds and resource utilization across frameworks.

How It Works

DLPerf evaluates frameworks by training standard DNN models like ResNet-50 and BERT-Base. Benchmarks are conducted across multi-node, multi-device configurations (1-32 GPUs), varying batch sizes, and with/without optimizations like Automatic Mixed Precision (AMP) and XLA. Performance is measured by throughput (samples/second) and latency, with median values reported after ignoring initial training steps to ensure stability.

Quick Start & Requirements

  • Install: Clone the repository and follow framework-specific setup instructions within subdirectories (e.g., OneFlow/, PyTorch/).
  • Prerequisites: NVIDIA GPUs, CUDA, and specific framework installations (e.g., OneFlow, TensorFlow 1.x/2.x, PyTorch, MXNet, PaddlePaddle, MindSpore). Multi-node setups require distributed training configurations.
  • Resources: Benchmarking requires significant GPU resources (e.g., 4 nodes with 8x V100 GPUs each) and substantial training time.
  • Links: NVIDIA DeepLearningExamples, OneFlow-Benchmark

Highlighted Details

  • Comprehensive comparison of 7+ major DL frameworks on ResNet-50 and BERT-Base.
  • Detailed performance metrics including FP32 and AMP throughput, speedups, and memory usage.
  • Benchmarks for specialized models like Wide & Deep Learning (W&D) and InsightFace, highlighting memory efficiency.
  • GPT model performance comparison against Megatron-LM, showing OneFlow's advantages.

Maintenance & Community

The project appears to be actively maintained by Oneflow-Inc. Specific community channels or active contributors beyond the primary organization are not detailed in the README.

Licensing & Compatibility

The repository's licensing is not explicitly stated in the README. However, it references NVIDIA DeepLearningExamples, which are typically under permissive licenses (e.g., Apache 2.0), and framework-specific repositories which have their own licenses. Compatibility for commercial use would depend on the licenses of the underlying frameworks and example code.

Limitations & Caveats

The benchmark results are specific to the tested hardware (NVIDIA V100 GPUs) and configurations. Some frameworks have limitations noted, such as PyTorch's lack of native AMP support in the tested examples and PaddlePaddle's OOM issues with specific batch sizes and DALI integration. Reproducing exact results may require precise environment replication.

Health Check
Last commit

3 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
12 more.

DeepSpeed by deepspeedai

0.2%
40k
Deep learning optimization library for distributed training and inference
created 5 years ago
updated 11 hours ago
Feedback? Help us improve.