transformers-benchmarks  by mli

Transformer training benchmark for GPUs

Created 3 years ago
914 stars

Top 39.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository benchmarks the real-world TeraFLOPS achieved by training Transformer models across various NVIDIA GPUs, including multi-GPU and multi-node setups. It targets researchers and engineers needing to estimate training times for large-scale models, providing practical performance data and tools for self-benchmarking.

How It Works

The project measures TeraFLOPS by executing micro-benchmarks and full Transformer layer forward/backward passes for models like BERT, GPT-2, and T5. It compares achieved performance against theoretical hardware limits, offering insights into how factors like precision (TF32/FP16), batch size, and specific GPU architectures impact actual throughput.

Quick Start & Requirements

  • Install/Run: Use the provided NVIDIA PyTorch Docker image (nvcr.io/nvidia/pytorch:22.07-py3).
  • Prerequisites: CUDA-enabled PyTorch, NVIDIA Docker.
  • Setup: Launch the Docker container, then run Jupyter Notebook within it.
  • Links: PyTorch Docker Image

Highlighted Details

  • Benchmarks real TeraFLOPS for Transformer training on A100, A6000, V100, 3090 Ti, and 4090 GPUs.
  • Compares theoretical vs. actual performance for matrix multiplication and full Transformer layers.
  • Includes performance data for both forward and forward+backward passes.
  • Provides Jupyter notebooks for users to run their own benchmarks.

Maintenance & Community

No specific community channels or contributor details are listed in the README.

Licensing & Compatibility

The repository's license is not specified in the README.

Limitations & Caveats

Performance figures are specific to the hardware and configurations tested by the authors and may vary significantly based on user's environment, CUDA version, and specific model implementations.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Ying Sheng Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

0.4%
4k
High-performance C++ LLM inference library
Created 2 years ago
Updated 1 week ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), and
7 more.

TransformerEngine by NVIDIA

0.4%
3k
Library for Transformer model acceleration on NVIDIA GPUs
Created 3 years ago
Updated 22 hours ago
Feedback? Help us improve.