whichllm  by Andyyyy64

Hardware-aware local LLM selection tool

Created 2 months ago
1,818 stars

Top 23.3% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project addresses the challenge of selecting the optimal local Large Language Model (LLM) for specific hardware, moving beyond simple size-based fitting. It targets engineers, researchers, and power users by providing an automated, evidence-based ranking system that considers real-world benchmarks and hardware capabilities, enabling faster and more informed LLM adoption.

How It Works

whichllm automatically detects a user's hardware (GPU, CPU, RAM) and queries the HuggingFace API for popular LLMs. It ranks models by a composite score derived from real-world benchmarks (LiveBench, Chatbot Arena ELO, Open LLM Leaderboard, etc.), confidence-weighted evidence, and hardware fit. The system accounts for quantization, VRAM usage (weights, KV cache, activations), speed, and recency, providing a more accurate performance prediction than size alone.

Quick Start & Requirements

  • Primary install / run command: uvx whichllm (recommended), uv tool install whichllm, brew install andyyyy64/whichllm/whichllm, or pip install whichllm.
  • Non-default prerequisites and dependencies: Python 3.11+. NVIDIA GPU detection requires nvidia-ml-py. AMD/Apple Silicon detected automatically.
  • Estimated setup time or resource footprint: "One command, run it instantly." Creates an isolated environment via uv.
  • Links: Official quick-start is the README itself.

Highlighted Details

  • Auto-detects hardware (NVIDIA, AMD, Apple Silicon, CPU-only).
  • Smart ranking based on VRAM fit, speed, and benchmark quality, not just parameter count.
  • One-command chat execution (whichllm run) with automatic model download and environment setup.
  • Live data fetched directly from HuggingFace API, with curated frozen fallbacks.
  • Integrates real evaluation scores with confidence-based dampening and recency awareness.
  • GPU simulation for hardware planning (whichllm --gpu) and reverse lookup (whichllm plan).
  • Provides ready-to-run Python code snippets (whichllm snippet).

Maintenance & Community

Contributions are welcome; refer to CONTRIBUTING.md for guidelines. No specific community channels (e.g., Discord, Slack) or notable contributors/sponsorships are detailed in the provided text.

Licensing & Compatibility

  • License type: MIT.
  • Compatibility notes: The MIT license is permissive and generally compatible with commercial use and closed-source linking without significant restrictions.

Limitations & Caveats

The ranking score includes markers (~, ?) indicating when direct benchmark data is unavailable, relying instead on inherited or interpolated scores, which may affect accuracy for some models. Apple Silicon and CPU-only modes are restricted to GGUF formats for stability. The system actively rejects fabricated or misleading benchmark claims from model uploaders.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
36
Issues (30d)
32
Star History
1,810 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Ying Sheng Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

0.8%
5k
High-performance C++ LLM inference library
Created 3 years ago
Updated 11 hours ago
Feedback? Help us improve.