llm-inference-calculator by alexziskind1

LLM inference hardware calculator

Created 1 year ago

298 stars

Top 89.0% on SourcePulse

Project Summary

A web-based calculator estimates hardware requirements for Large Language Model (LLM) inference. It targets engineers and researchers needing to plan VRAM, system RAM, and GPU configurations, simplifying hardware procurement and deployment decisions.

How It Works

Built with React, TypeScript, and Vite, the tool employs a direct calculation approach. It takes user inputs for model size (parameters), quantization method (e.g., FP32, FP16, INT8, INT4), context length, and KV cache settings to derive hardware needs. The architecture supports estimations for both discrete GPU setups and unified memory systems, providing a clear overview of resource demands.

Quick Start & Requirements

Install/Run: Use npm install and npm run dev for development, or npm run build for production.
Docker: Create a .env file from .env.example, set PORT, then run docker compose up -d --build.
Prerequisites: Node.js and npm are required for local development. Docker and Docker Compose are needed for containerized deployment.
Assumptions: Discrete GPU calculations default to 24GB VRAM cards (e.g., RTX 3090/4090). Unified memory calculations assume up to 75% of system RAM is available for VRAM.

Highlighted Details

Comprehensive VRAM calculation considering model size, quantization levels (FP32/FP16/INT8/INT4), context length, and KV cache.
Estimates required VRAM, minimum system RAM, on-disk model size, and the number of GPUs needed.
Supports both discrete GPU and unified memory system configurations.

Maintenance & Community

No specific details on contributors, community channels, or roadmap are provided in the README.

Licensing & Compatibility

The project is released under the MIT License, which is highly permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

All calculations are approximations and actual requirements may vary based on specific LLM implementations and runtime environments. The tool's discrete GPU estimations are based on a fixed 24GB VRAM card assumption.

llm-inference-calculator by alexziskind1

Explore Similar Projects

Amis by cPilot-GUI

Awesome-KV-Cache-Management by TreeAI-Lab

LLM-Calc by RayFernando1337

LLMCompass by PrincetonUniversity

gpu_poor by RahulSChand

LLM-Viewer by hahnyuan

amd-strix-halo-toolboxes by kyuz0

lucebox by Luce-Org

ollm by Mega4alik

whichllm by Andyyyy64

LiteRT-LM by google-ai-edge

llmfit by AlexsJones