xprof  by openxla

ML profiling and performance analysis tool

created 5 years ago
405 stars

Top 72.8% on sourcepulse

GitHubView on GitHub
Project Summary

This tool provides a profiling and performance analysis suite for JAX, TensorFlow, and PyTorch/XLA, targeting ML engineers and researchers. It helps users understand, debug, and optimize model performance across CPUs, GPUs, and TPUs through detailed visualizations and breakdowns.

How It Works

The profiler integrates as a TensorBoard plugin, offering several analysis tools. It visualizes execution timelines (Trace Viewer), aggregates performance metrics (Overview), monitors memory usage (Memory Profile Viewer), and displays HLO graph structures (Graph Viewer). This approach allows for a comprehensive, multi-faceted view of model performance within a familiar TensorBoard environment.

Quick Start & Requirements

  • Install: pip install tbp-nightly (for the latest version).
  • Prerequisites: TensorFlow >= 2.18.0, TensorBoard >= 2.18.0, tensorboard-plugin-profile >= 2.18.0. For GPU profiling, NVIDIA drivers and CUDA Toolkit (e.g., CUDA 12.5 requires 525.60.13+) with CUPTI 10.1 are necessary. Internet access is required for loading Google Chart libraries.
  • Run: tensorboard --logdir=profiler/demo
  • Demo: Colab Demo

Highlighted Details

  • Supports JAX, TensorFlow, and PyTorch/XLA.
  • Analyzes performance on CPUs, GPUs, and TPUs.
  • Includes Overview, Trace Viewer, Memory Profile Viewer, and Graph Viewer.
  • Requires internet access for full functionality of some visualizations.

Maintenance & Community

The project follows TensorFlow's versioning scheme. Links to guides for JAX, TensorFlow, and Cloud TPU profiling are provided.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Offline usage may result in missing charts and tables. Multi-worker GPU profiling requires independent analysis of each worker. Cloud TPU profiling necessitates Google Cloud TPU access.

Health Check
Last commit

1 day ago

Responsiveness

1+ week

Pull Requests (30d)
93
Issues (30d)
5
Star History
33 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.