recipes  by vllm-project

LLM inference recipes

Created 5 months ago
330 stars

Top 83.1% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This repository provides a curated collection of community-maintained recipes for running the vLLM inference engine with a wide array of large language models. It targets engineers and researchers seeking practical, ready-to-use configurations for deploying specific models on diverse hardware for various tasks, simplifying the often complex process of model deployment.

How It Works

The project functions as a central hub for practical examples, addressing the common question of how to run specific models (like Llama, Qwen, DeepSeek) with vLLM. Each "recipe" likely contains configuration files, command-line examples, and potentially scripts tailored for particular model architectures, versions, and tasks (e.g., OCR, vision-language). This community-driven approach ensures recipes are relevant and cover a broad spectrum of use cases.

Quick Start & Requirements

To build the documentation locally, users should set up a virtual environment (uv venv), activate it (source .venv/bin/activate), install dependencies (uv pip install -r requirements.txt), and then serve the documentation (uv run mkdocs serve). Specific hardware or software prerequisites for running the models themselves are detailed within individual recipes, not in the main README.

Highlighted Details

  • Extensive model support, including recipes for DeepSeek, Ernie, GLM, Llama (e.g., Llama3.3-70B, Llama3.1), MiniMax, Moonshotai, OpenAI (gpt-oss), PaddlePaddle, Qwen (e.g., Qwen2.5-VL), Seed, and Tencent-Hunyuan models.
  • Covers diverse tasks such as OCR and Vision-Language (VL) capabilities.
  • Community-driven contribution model encourages ongoing expansion and updates.

Maintenance & Community

The repository relies on community contributions via Pull Requests (PRs) to add new recipes or improve existing ones. While specific community channels like Discord/Slack are not mentioned, the contribution model itself fosters a collaborative environment.

Licensing & Compatibility

The project is licensed under the Apache License 2.0. This permissive license generally allows for commercial use and integration into closed-source projects, with standard attribution and notice requirements.

Limitations & Caveats

This repository focuses on running existing models with vLLM and does not provide the models themselves or the vLLM engine. Users are expected to have vLLM installed and to adapt the provided recipes to their specific environment and model weights. The effectiveness of recipes may vary depending on the exact model version and hardware configuration.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
28
Issues (30d)
2
Star History
57 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Nikola Borisov Nikola Borisov(Founder and CEO of DeepInfra), and
3 more.

tensorrtllm_backend by triton-inference-server

0.1%
912
Triton backend for serving TensorRT-LLM models
Created 2 years ago
Updated 2 days ago
Feedback? Help us improve.