xprof  by openxla

ML profiling and performance analysis tool

Created 5 years ago
423 stars

Top 69.6% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This tool provides a profiling and performance analysis suite for JAX, TensorFlow, and PyTorch/XLA, targeting ML engineers and researchers. It helps users understand, debug, and optimize model performance across CPUs, GPUs, and TPUs through detailed visualizations and breakdowns.

How It Works

The profiler integrates as a TensorBoard plugin, offering several analysis tools. It visualizes execution timelines (Trace Viewer), aggregates performance metrics (Overview), monitors memory usage (Memory Profile Viewer), and displays HLO graph structures (Graph Viewer). This approach allows for a comprehensive, multi-faceted view of model performance within a familiar TensorBoard environment.

Quick Start & Requirements

  • Install: pip install tbp-nightly (for the latest version).
  • Prerequisites: TensorFlow >= 2.18.0, TensorBoard >= 2.18.0, tensorboard-plugin-profile >= 2.18.0. For GPU profiling, NVIDIA drivers and CUDA Toolkit (e.g., CUDA 12.5 requires 525.60.13+) with CUPTI 10.1 are necessary. Internet access is required for loading Google Chart libraries.
  • Run: tensorboard --logdir=profiler/demo
  • Demo: Colab Demo

Highlighted Details

  • Supports JAX, TensorFlow, and PyTorch/XLA.
  • Analyzes performance on CPUs, GPUs, and TPUs.
  • Includes Overview, Trace Viewer, Memory Profile Viewer, and Graph Viewer.
  • Requires internet access for full functionality of some visualizations.

Maintenance & Community

The project follows TensorFlow's versioning scheme. Links to guides for JAX, TensorFlow, and Cloud TPU profiling are provided.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Offline usage may result in missing charts and tables. Multi-worker GPU profiling requires independent analysis of each worker. Cloud TPU profiling necessitates Google Cloud TPU access.

Health Check
Last Commit

14 hours ago

Responsiveness

1+ week

Pull Requests (30d)
61
Issues (30d)
2
Star History
11 stars in the last 30 days

Explore Similar Projects

Starred by Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

KernelBench by ScalingIntelligence

1.9%
569
Benchmark for LLMs generating GPU kernels from PyTorch ops
Created 10 months ago
Updated 3 weeks ago
Starred by David Cournapeau David Cournapeau(Author of scikit-learn), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
5 more.

lectures by gpu-mode

0.8%
5k
Lecture series for GPU-accelerated computing
Created 1 year ago
Updated 4 days ago
Feedback? Help us improve.