vllm-cli by Chen-zexi

CLI for serving LLMs with vLLM

Created 5 months ago

461 stars

Top 65.7% on SourcePulse

Project Summary

This tool provides a command-line interface for serving Large Language Models using vLLM, targeting developers and researchers who need a streamlined way to deploy and manage LLM servers. It offers both interactive and scriptable modes, simplifying model discovery, configuration, and server monitoring.

How It Works

vLLM CLI leverages vLLM for efficient LLM serving, integrating with hf-model-tool for comprehensive local and remote model discovery. It supports automatic detection of models from HuggingFace Hub, Ollama directories, and custom locations. The tool manages vLLM server processes, allowing for flexible configuration through profiles and direct command-line arguments, with built-in support for LoRA adapters.

Quick Start & Requirements

Install: pip install vllm-cli
Prerequisites: Python 3.11+, CUDA-compatible NVIDIA GPU.
Documentation: 📚 Documentation

Highlighted Details

Supports serving models with LoRA adapters.
Experimental support for Ollama-downloaded GGUF models.
Offers pre-configured profiles for common use cases (e.g., moe_optimized, high_throughput, low_memory).
Integrates hf-model-tool for unified model management across various sources.

Maintenance & Community

Active development with recent updates (v0.2.3).
Contributions are welcome.

Licensing & Compatibility

MIT License. Permissive for commercial use and closed-source linking.

Limitations & Caveats

Currently supports only NVIDIA GPUs; AMD GPU support is a planned future enhancement. Ollama GGUF model support is experimental.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

1

Issues (30d)

1

Star History

6 stars in the last 30 days

Explore Similar Projects

Starred by

Michael Han

Michael Han(Cofounder of Unsloth).

Kolo by MaxHastings

CLI tool for local LLM fine-tuning automation

Created 11 months ago

Updated 3 weeks ago

gateway by adaline

Local Super SDK for unified LLM interface (200+ models)

Created 1 year ago

Updated 2 days ago

rails-mcp-server by maquina-app

LLM interaction server for Rails projects

Created 9 months ago

Updated 6 days ago

deepseek-mcp-server by DMontgomery40

MCP server for DeepSeek API integration

Created 11 months ago

Updated 1 month ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic).

arc-agi-benchmarking by arcprize

CLI tool for benchmarking LLMs on ARC-AGI tasks

Created 1 year ago

Updated 1 week ago

model-gallery by go-skynet

Model gallery for LocalAI

Created 2 years ago

Updated 1 year ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

olmes by allenai

LLM evaluation system for reproducible research

Created 1 year ago

Updated 1 month ago

Starred by

Jeffrey Morgan

Jeffrey Morgan(Cofounder of Ollama).

gollama by sammcj

CLI tool for managing Ollama models

Created 1 year ago

Updated 1 week ago

Starred by

Luca Soldaini

Luca Soldaini(Research Scientist at Ai2) and

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

LLM.swift by eastriverlee

Swift SDK for local LLM interaction on Apple platforms

Created 2 years ago

Updated 1 month ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs), and

20 more.

openplayground by nat

LLM playground for local laptop use

Created 2 years ago

Updated 2 days ago

Starred by

Chaoyu Yang

Chaoyu Yang(Founder of Bento),

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen), and

12 more.

OpenLLM by bentoml

SDK for running open-source LLMs as OpenAI-compatible APIs

Created 2 years ago

Updated 2 weeks ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Jeffrey Morgan

Jeffrey Morgan(Cofounder of Ollama), and

48 more.

ollama by ollama

CLI tool for running LLMs locally

Created 2 years ago

Updated 22 hours ago

Feedback? Help us improve.