Discover and explore top open-source AI tools and projects—updated daily.
CLI for serving LLMs with vLLM
Top 71.5% on SourcePulse
This tool provides a command-line interface for serving Large Language Models using vLLM, targeting developers and researchers who need a streamlined way to deploy and manage LLM servers. It offers both interactive and scriptable modes, simplifying model discovery, configuration, and server monitoring.
How It Works
vLLM CLI leverages vLLM for efficient LLM serving, integrating with hf-model-tool
for comprehensive local and remote model discovery. It supports automatic detection of models from HuggingFace Hub, Ollama directories, and custom locations. The tool manages vLLM server processes, allowing for flexible configuration through profiles and direct command-line arguments, with built-in support for LoRA adapters.
Quick Start & Requirements
pip install vllm-cli
Highlighted Details
moe_optimized
, high_throughput
, low_memory
).hf-model-tool
for unified model management across various sources.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Currently supports only NVIDIA GPUs; AMD GPU support is a planned future enhancement. Ollama GGUF model support is experimental.
3 weeks ago
Inactive