Discover and explore top open-source AI tools and projects—updated daily.
noonghunnaLocal LLM serving recipes for RTX 3090 GPUs
New!
Top 33.2% on SourcePulse
This repository provides community-vetted configurations and benchmarks for serving modern LLMs locally on NVIDIA RTX 3090 GPUs. It targets users with one or two 3090s seeking to run LLMs at home or in a homelab, offering optimized setups for maximum throughput or maximum context/robustness, complete with a drop-in OpenAI-compatible API.
How It Works
The project employs a multi-engine, model-agnostic approach, supporting vLLM for high-throughput inference (up to 127 TPS) and llama.cpp for maximum context length (262K) and robustness. Configurations are provided via Docker Compose, enabling an OpenAI-compatible API endpoint. The architecture scales easily for new models, currently featuring Qwen3.6-27B.
Quick Start & Requirements
Installation involves cloning the repo, running scripts/setup.sh <model>, and then scripts/launch.sh for an interactive setup. Key requirements include 1-2x NVIDIA RTX 3090 (24 GB each), Linux (Ubuntu 22.04+ tested), Docker with NVIDIA Container Toolkit, and NVIDIA driver 580.x+. Detailed hardware notes are in docs/HARDWARE.md.
Highlighted Details
Maintenance & Community
The project acknowledges contributions from key individuals and projects like vLLM and llama.cpp. It consolidates previous efforts into a single repository, encouraging new issues here. Community feedback from Reddit/X is noted, but direct community links are absent.
Licensing & Compatibility
Licensed under Apache 2.0, permitting broad usage, modification, and distribution, including for commercial purposes.
Limitations & Caveats
Focuses on RTX 3090 (24 GB); smaller GPUs are insufficient for 27B models. vLLM requires Linux/CUDA; llama.cpp recipes assume Linux paths. SGLang engine is currently blocked.
19 hours ago
Inactive
AI-Hypercomputer
lemonade-sdk