llm-inference-calculator  by alexziskind1

LLM inference hardware calculator

Created 1 year ago
256 stars

Top 98.5% on SourcePulse

GitHubView on GitHub
Project Summary

A web-based calculator estimates hardware requirements for Large Language Model (LLM) inference. It targets engineers and researchers needing to plan VRAM, system RAM, and GPU configurations, simplifying hardware procurement and deployment decisions.

How It Works

Built with React, TypeScript, and Vite, the tool employs a direct calculation approach. It takes user inputs for model size (parameters), quantization method (e.g., FP32, FP16, INT8, INT4), context length, and KV cache settings to derive hardware needs. The architecture supports estimations for both discrete GPU setups and unified memory systems, providing a clear overview of resource demands.

Quick Start & Requirements

  • Install/Run: Use npm install and npm run dev for development, or npm run build for production.
  • Docker: Create a .env file from .env.example, set PORT, then run docker compose up -d --build.
  • Prerequisites: Node.js and npm are required for local development. Docker and Docker Compose are needed for containerized deployment.
  • Assumptions: Discrete GPU calculations default to 24GB VRAM cards (e.g., RTX 3090/4090). Unified memory calculations assume up to 75% of system RAM is available for VRAM.

Highlighted Details

  • Comprehensive VRAM calculation considering model size, quantization levels (FP32/FP16/INT8/INT4), context length, and KV cache.
  • Estimates required VRAM, minimum system RAM, on-disk model size, and the number of GPUs needed.
  • Supports both discrete GPU and unified memory system configurations.

Maintenance & Community

No specific details on contributors, community channels, or roadmap are provided in the README.

Licensing & Compatibility

The project is released under the MIT License, which is highly permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

All calculations are approximations and actual requirements may vary based on specific LLM implementations and runtime environments. The tool's discrete GPU estimations are based on a fixed 24GB VRAM card assumption.

Health Check
Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
0
Star History
35 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Ying Sheng Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

0.1%
4k
High-performance C++ LLM inference library
Created 2 years ago
Updated 2 days ago
Feedback? Help us improve.