worker-vllm by runpod-workers

RunPod worker template for blazing-fast LLM endpoints

Created 2 years ago

391 stars

Top 73.5% on SourcePulse

View on GitHub

2 Experts Love This Project

Project Summary

This project provides a RunPod worker template for deploying large language model (LLM) endpoints using the vLLM inference engine. It targets developers and researchers who need to serve LLMs efficiently and offers OpenAI-compatible API endpoints for seamless integration with existing applications.

How It Works

The worker leverages vLLM's optimized inference capabilities, including PagedAttention for efficient memory management and continuous batching for high throughput. It supports a wide range of Hugging Face compatible model architectures and can be configured via environment variables or by building a custom Docker image with the model baked in. This approach allows for flexible deployment and fine-tuning of LLM serving parameters.

Quick Start & Requirements

Install/Run: Deploy via RunPod Serverless Endpoint using pre-built Docker images (e.g., runpod/worker-v1-vllm:v2.4.0stable-cuda12.1.0).
Prerequisites: RunPod account, CUDA 12.1.0+ recommended.
Setup: Near-instant deployment due to image caching.
Docs: RunPod Serverless Worker vLLM

Highlighted Details

OpenAI-compatible API for Chat Completions and Models.
Supports numerous LLM architectures including Llama, Mistral, Mixtral, Qwen, and more.
Extensive configuration options for quantization, tensor parallelism, batching, and sampling.
Image caching across RunPod machines for rapid deployment.

Maintenance & Community

Maintained by RunPod. Community support channels are not explicitly listed in the README.

Licensing & Compatibility

The project itself appears to be open-source, but the underlying vLLM library has its own license. Compatibility for commercial use depends on the specific LLM model deployed and its associated license.

Limitations & Caveats

The README notes that logit_bias and user parameters are unsupported by vLLM for OpenAI compatibility. Some advanced configurations or less common model architectures might require custom Docker image builds.

Health Check

Last Commit

2 days ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days