Discover and explore top open-source AI tools and projects—updated daily.
eugrDockerized vLLM for high-performance multi-node inference
Top 73.8% on SourcePulse
This repository provides Docker configurations and startup scripts for deploying vLLM, a high-throughput LLM inference engine, on DGX Spark systems. It targets users needing to run large language models efficiently in multi-node or single-node setups, leveraging Ray for cluster management and InfiniBand/RDMA for high-performance communication. The primary benefit is enabling optimized, scalable LLM inference on specialized hardware.
How It Works
The project utilizes Docker to package vLLM and its dependencies, integrating with Ray for distributed execution across multiple DGX Spark nodes. It is specifically engineered to leverage InfiniBand/NCCL for low-latency, high-bandwidth inter-node communication, crucial for large-scale inference workloads. The approach prioritizes performance by building directly from the vLLM main branch and offering optimizations for DGX Spark's networking and hardware architecture.
Quick Start & Requirements
./build-and-copy.sh (recommended for cluster deployment) or docker build -f Dockerfile.wheels -t vllm-node . (for wheels build)../launch-cluster.sh --solo exec vllm serve <model> ... or docker run ... vllm serve <model> ..../launch-cluster.sh exec vllm serve <model> ...--gpu-arch.uvx for hf-download.sh script.https://wheels.vllm.ai/nightly/cu130/vllm/Highlighted Details
build-and-copy.sh, launch-cluster.sh, hf-download.sh) for streamlined build, deployment, and model management.fastsafetensors support for accelerated model loading.Maintenance & Community
This project is a community effort, not officially affiliated with NVIDIA. It acknowledges contributions from individuals like @raphaelamorim and @ericlewis. Specific community channels or roadmaps are not detailed in the provided README excerpt.
Licensing & Compatibility
The specific open-source license for this repository is not explicitly stated in the provided README excerpt. Compatibility notes for commercial use or integration with closed-source projects are also not detailed.
Limitations & Caveats
The Dockerfile builds from the vLLM main branch, which may occasionally be in an unstable state. Wheel builds can encounter platform-specific dependency issues. NVFP4 models on Spark are noted as having suboptimal performance and potential stability issues within vLLM. The fastsafetensors feature in a cluster configuration is experimental. The default build targets CUDA 12.1a architecture, requiring explicit configuration for other GPU architectures.
22 hours ago
Inactive
AI-Hypercomputer
Lightning-AI
llm-d
AI-Hypercomputer
ai-dynamo
pytorch