LLM analysis tool for inference performance on diverse hardware
Top 61.5% on sourcepulse
LLM-Viewer is a tool designed to analyze and visualize the inference performance of Large Language Models (LLMs) across various hardware platforms. It targets researchers and engineers seeking to understand and optimize LLM inference by examining computation, memory, and transmission aspects through a user-friendly interface and a hardware roofline model.
How It Works
LLM-Viewer analyzes LLMs by gathering layer-specific information (computation, tensor shapes, dependencies), defining hardware capabilities (compute capacity, memory bandwidth), and configuring inference parameters (batch size, sequence length). It then applies a roofline model to estimate layer and network performance, track memory usage, and identify bottlenecks, providing insights into factors influencing inference.
Quick Start & Requirements
pip install transformers flask flask_cors easydict
transformers
library.python3 analyze_cli.py <model_name> <hardware_name> [options]
Highlighted Details
Maintenance & Community
This is an ongoing project with a TODO list indicating planned enhancements. No specific community channels or contributor details are provided in the README.
Licensing & Compatibility
The repository is available on GitHub under an unspecified license. The README does not detail licensing restrictions or compatibility for commercial use.
Limitations & Caveats
The tool's time estimations represent theoretical performance and should only be used for relative comparisons. Support for LLMs and hardware configurations is expanding, with some features (e.g., non-transformer layers, full network visualization) still under development.
10 months ago
1 week