Discover and explore top open-source AI tools and projects—updated daily.
cli99CLI tool for LLM latency/memory analysis during training/inference
Top 65.6% on SourcePulse
This project provides a Python library for estimating the latency and memory usage of Transformer models during training and inference. It targets researchers and engineers who need to theoretically evaluate different LLM configurations, hardware setups, and parallelism strategies to optimize system performance and cost.
How It Works
The library models latency and memory based on user-defined configurations for model architecture, GPU specifications, data types, and parallelism schemes (Tensor, Pipeline, Sequence, Expert, Data Parallelism). It leverages formulas and equations commonly found in research papers, automating calculations that would otherwise be done manually. The approach allows for theoretical "what-if" analysis to understand the impact of various optimizations like quantization or parallelism.
Quick Start & Requirements
pip install llm-analysispip install . or poetry installtransformers library).Highlighted Details
Maintenance & Community
pre-commit for code formatting.Licensing & Compatibility
Limitations & Caveats
6 months ago
Inactive
microsoft
tunib-ai
stochasticai
lyogavin