llmfit by AlexsJones

LLM model selection and optimization tool for local hardware

Created 4 months ago

29,274 stars

Top 1.5% on SourcePulse

View on GitHub

4 Experts Love This Project

Wes McKinney

Author of Pandas

Joe Walnes

Head of Experimental Projects at Stripe

Thomas Wolf

Cofounder of Hugging Face

Abubakar Abid

Cofounder of Gradio

Project Summary

Summary llmfit is a terminal tool that addresses the challenge of running LLMs locally by providing a single command to discover which of its 94 models from 30 providers will run optimally on your specific hardware. It automatically detects system RAM, CPU, and GPU capabilities, then scores models across quality, speed, fit, and context dimensions. The tool offers an interactive TUI and a classic CLI, simplifying the process of finding suitable LLMs for your machine, and supports advanced features like multi-GPU setups and MoE architectures.

How It Works The tool probes system hardware (RAM, CPU, GPUs) and detects acceleration backends (CUDA, Metal, ROCm, SYCL, CPU). It uses a compiled-in database of 94 HuggingFace models. llmfit estimates memory requirements, crucially supporting Mixture-of-Experts (MoE) by calculating VRAM based on active experts. It employs dynamic quantization, selecting the highest quality quantization (e.g., Q8_0 down to Q2_K) that fits available memory. Models are scored across Quality, Speed, Fit, and Context dimensions, with weights adjusted per use-case, culminating in a composite ranking score.

Quick Start & Requirements Installation is straightforward across macOS, Linux, and Windows. For macOS/Linux, use the quick install script (curl -fsSL https://llmfit.axjns.dev/install.sh | sh) or Homebrew (brew tap AlexsJones/llmfit && brew install llmfit). Alternatively, install via Cargo (cargo install llmfit), which requires Rust/rustup. Building from source involves cloning the repo and running cargo build --release.

Highlighted Details

Interactive TUI and classic CLI modes for model discovery.
Automated hardware and acceleration backend detection.
Accurate VRAM estimation for MoE architectures.
Dynamic quantization for optimal model fit and quality.
Multi-dimensional scoring tailored to LLM use-cases.
JSON output for agent and scripting integration.
OpenClaw skill for hardware-aware local model recommendations and configuration.

Maintenance & Community Contributions are welcome, especially for expanding the model database. An automated script (make update-models) updates the HuggingFace model list. The llmfit-advisor skill integrates with the OpenClaw agent ecosystem.

Licensing & Compatibility llmfit is released under the permissive MIT license, allowing for commercial use and integration within closed-source projects.

Limitations & Caveats Speed metrics are estimated, not based on real-world benchmarking. VRAM reporting for AMD GPUs may be unknown. The model database is embedded and requires manual updates via provided scripts.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1,584 stars in the last 30 days