Discover and explore top open-source AI tools and projects—updated daily.
m0atHigh-performance LLM inference engine in Rust
New!
Top 70.7% on SourcePulse
Summary
rvLLM is a high-performance LLM inference engine written from scratch in Rust, designed as a drop-in replacement for vLLM. It targets engineers and power users seeking dramatically improved resource efficiency, faster startup times, and smaller deployment footprints compared to Python-based alternatives. The project offers a compelling alternative for efficient LLM serving.
How It Works
The core innovation lies in its pure Rust implementation, eliminating Python's overhead (GIL, GC, interpreter). rvLLM employs a novel Rust-native PTX compiler that generates fused GPU kernels at model load time, achieving 2-7.5x speedups over hand-written CUDA for specific operations. It features an FA3 v3 attention mechanism with cp.async and split-KV for long contexts, alongside CUDA graph replay and cuBLAS autotuning for optimized execution.
Quick Start & Requirements
cargo install rvllm or pip install rvllm. Source build requires cargo build --release --features cuda.docs/arch.md, docs/benchmark-history.md, and docs/cutlass-epilogue-spec.md.Highlighted Details
Maintenance & Community
The provided README does not contain specific details regarding maintainers, community channels (like Discord/Slack), or a public roadmap.
Licensing & Compatibility
The license type is not explicitly stated in the provided README content. Compatibility for commercial use or closed-source linking cannot be determined without this information.
Limitations & Caveats
rvLLM exhibits performance gaps compared to vLLM, particularly in HTTP throughput (0.67-0.88x) and direct engine throughput (0.82-0.96x), primarily due to differences in GEMM tuning and attention kernel optimizations. Its scheduler is less mature than vLLM's. Quantization support is limited to FP8 weights, whereas vLLM supports a wider range of formats. Speculative decoding is experimental and shows limited benefit on smaller models.
4 days ago
Inactive
EricLBuehler
huggingface
GeeeekExplorer
vllm-project