vllm-cli  by Chen-zexi

CLI for serving LLMs with vLLM

Created 1 month ago
408 stars

Top 71.5% on SourcePulse

GitHubView on GitHub
Project Summary

This tool provides a command-line interface for serving Large Language Models using vLLM, targeting developers and researchers who need a streamlined way to deploy and manage LLM servers. It offers both interactive and scriptable modes, simplifying model discovery, configuration, and server monitoring.

How It Works

vLLM CLI leverages vLLM for efficient LLM serving, integrating with hf-model-tool for comprehensive local and remote model discovery. It supports automatic detection of models from HuggingFace Hub, Ollama directories, and custom locations. The tool manages vLLM server processes, allowing for flexible configuration through profiles and direct command-line arguments, with built-in support for LoRA adapters.

Quick Start & Requirements

  • Install: pip install vllm-cli
  • Prerequisites: Python 3.11+, CUDA-compatible NVIDIA GPU.
  • Documentation: 📚 Documentation

Highlighted Details

  • Supports serving models with LoRA adapters.
  • Experimental support for Ollama-downloaded GGUF models.
  • Offers pre-configured profiles for common use cases (e.g., moe_optimized, high_throughput, low_memory).
  • Integrates hf-model-tool for unified model management across various sources.

Maintenance & Community

  • Active development with recent updates (v0.2.3).
  • Contributions are welcome.

Licensing & Compatibility

  • MIT License. Permissive for commercial use and closed-source linking.

Limitations & Caveats

Currently supports only NVIDIA GPUs; AMD GPU support is a planned future enhancement. Ollama GGUF model support is experimental.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
5
Star History
84 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.