nano-vllm  by GeeeekExplorer

Lightweight vLLM implementation from scratch

Created 8 months ago
11,804 stars

Top 4.3% on SourcePulse

GitHubView on GitHub
Project Summary

Nano-vLLM offers a lightweight, from-scratch implementation of vLLM for fast offline inference of large language models. Targeting developers and researchers seeking a more accessible and understandable LLM inference engine, it provides comparable speeds to vLLM with a significantly smaller codebase.

How It Works

Nano-vLLM leverages a suite of optimizations including prefix caching, tensor parallelism, Torch compilation, and CUDA graphs to achieve high inference throughput. Its design prioritizes a readable codebase, aiming for around 1,200 lines of Python, making it easier to understand, modify, and extend compared to more complex inference frameworks.

Quick Start & Requirements

  • Install via pip: pip install git+https://github.com/GeeeekExplorer/nano-vllm.git
  • Requires model weights to be downloaded separately (e.g., using huggingface-cli).
  • Example usage and API mirroring vLLM's interface are available in example.py.

Highlighted Details

  • Benchmarked on an RTX 4070 Laptop (8GB VRAM) with Qwen3-0.6B, achieving 1434 tokens/s, slightly outperforming vLLM's 1361 tokens/s in a specific test configuration.
  • Implements key optimizations: prefix caching, tensor parallelism, Torch compilation, CUDA graphs.
  • Codebase is approximately 1,200 lines of Python.

Maintenance & Community

No specific information on contributors, sponsorships, or community channels (Discord/Slack) is provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. This requires clarification for commercial use or integration into closed-source projects.

Limitations & Caveats

The project's license is not specified, which may pose a barrier to commercial adoption. The README does not detail compatibility with different hardware configurations beyond the benchmarked RTX 4070 Laptop or specific CUDA versions.

Health Check
Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
6
Issues (30d)
14
Star History
718 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.