whichllm by Andyyyy64

Hardware-aware local LLM selection tool

Created 4 months ago

5,729 stars

Top 8.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

David Cournapeau

Author of scikit-learn

Project Summary

This project addresses the challenge of selecting the optimal local Large Language Model (LLM) for specific hardware, moving beyond simple size-based fitting. It targets engineers, researchers, and power users by providing an automated, evidence-based ranking system that considers real-world benchmarks and hardware capabilities, enabling faster and more informed LLM adoption.

How It Works

whichllm automatically detects a user's hardware (GPU, CPU, RAM) and queries the HuggingFace API for popular LLMs. It ranks models by a composite score derived from real-world benchmarks (LiveBench, Chatbot Arena ELO, Open LLM Leaderboard, etc.), confidence-weighted evidence, and hardware fit. The system accounts for quantization, VRAM usage (weights, KV cache, activations), speed, and recency, providing a more accurate performance prediction than size alone.

Quick Start & Requirements

Primary install / run command: uvx whichllm (recommended), uv tool install whichllm, brew install andyyyy64/whichllm/whichllm, or pip install whichllm.
Non-default prerequisites and dependencies: Python 3.11+. NVIDIA GPU detection requires nvidia-ml-py. AMD/Apple Silicon detected automatically.
Estimated setup time or resource footprint: "One command, run it instantly." Creates an isolated environment via uv.
Links: Official quick-start is the README itself.

Highlighted Details

Auto-detects hardware (NVIDIA, AMD, Apple Silicon, CPU-only).
Smart ranking based on VRAM fit, speed, and benchmark quality, not just parameter count.
One-command chat execution (whichllm run) with automatic model download and environment setup.
Live data fetched directly from HuggingFace API, with curated frozen fallbacks.
Integrates real evaluation scores with confidence-based dampening and recency awareness.
GPU simulation for hardware planning (whichllm --gpu) and reverse lookup (whichllm plan).
Provides ready-to-run Python code snippets (whichllm snippet).

Maintenance & Community

Contributions are welcome; refer to CONTRIBUTING.md for guidelines. No specific community channels (e.g., Discord, Slack) or notable contributors/sponsorships are detailed in the provided text.

Licensing & Compatibility

License type: MIT.
Compatibility notes: The MIT license is permissive and generally compatible with commercial use and closed-source linking without significant restrictions.

Limitations & Caveats

The ranking score includes markers (~, ?) indicating when direct benchmark data is unavailable, relying instead on inherited or interpolated scores, which may affect accuracy for some models. Apple Silicon and CPU-only modes are restricted to GGUF formats for stability. The system actively rejects fabricated or misleading benchmark claims from model uploaders.

Health Check

Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1,233 stars in the last 30 days