This repository provides reference implementations for MLPerf™ training benchmarks, targeting ML engineers and researchers seeking to understand or implement standardized machine learning performance tests. It offers a starting point for benchmark implementations, enabling users to evaluate model training performance across various frameworks and hardware.
How It Works
The project offers code for MLPerf training benchmarks, including model implementations in at least one framework, Dockerfiles for containerized execution, dataset download scripts, and timing scripts. This approach standardizes the benchmarking process, allowing for reproducible performance comparisons across different hardware and software stacks.
Quick Start & Requirements
- Install/Run: Follow instructions within each benchmark's README. Generally involves setting up Docker and dependencies (e.g.,
install_cuda_docker.sh
), downloading datasets (./download_dataset.sh
), and building/running the Docker image.
- Prerequisites: Docker, CUDA (implied by
install_cuda_docker.sh
), specific framework dependencies (PyTorch, TensorFlow, NeMo, TorchRec, GLT), and large datasets (e.g., LAION-400M-filtered, C4, OpenImages).
- Resources: Benchmarks can be slow and require significant time and resources on reference hardware.
- Docs: MLPerf Training Benchmark paper
Highlighted Details
- Reference implementations for MLPerf Training v5.0, v4.1, and v4.0 benchmarks.
- Covers diverse models: RetinaNet, Stable Diffusion, BERT, Llama, DLRM, RGAT, GPT3, 3DUnet.
- Supports multiple frameworks: PyTorch, TensorFlow, NeMo, TorchRec, GLT, Paxml, Megatron-LM.
- Includes scripts for dataset download and verification.
Maintenance & Community
- The project is described as "alpha" or "beta" quality, encouraging community contributions via issues and pull requests.
- No specific community links (Discord/Slack) or roadmap are provided in the README.
Licensing & Compatibility
- The repository itself is not explicitly licensed in the provided README snippet. However, MLPerf is a consortium, and its benchmarks are generally intended for broad adoption. Specific framework licenses (PyTorch, TensorFlow, etc.) will apply to the reference implementations.
Limitations & Caveats
- Reference implementations are not fully optimized and not intended for "real" performance measurements of software frameworks or hardware.
- Benchmarks can be slow and resource-intensive.
- The project is in an early stage ("alpha" or "beta") and may have quality issues or require significant improvements.