sparkrun by spark-arena

LLM inference management for NVIDIA DGX Spark systems

Created 4 months ago

375 stars

Top 75.4% on SourcePulse

Project Summary

This project provides sparkrun, a command-line tool designed to simplify the launching, management, and stopping of Large Language Model (LLM) inference workloads specifically on NVIDIA DGX Spark systems. It aims to eliminate the complexity associated with traditional cluster management tools like Slurm or Kubernetes, offering a streamlined experience for users focused on running LLM inference.

How It Works

sparkrun employs a unified command-line interface to manage LLM workloads across one or more DGX Spark nodes. Its core approach leverages multi-runtime support, integrating seamlessly with popular inference engines such as vLLM, SGLang, and llama.cpp. The tool facilitates multi-node tensor parallelism, automatically detecting and configuring InfiniBand/RDMA networking for efficient distributed inference. Workloads are defined using a Git-based recipe registry system, allowing users to easily access, share, and manage model configurations, including official, community, and custom benchmarks. A guided setup wizard automates the initial cluster configuration, including SSH mesh setup, network detection, and resource management daemon configuration.

Quick Start & Requirements

Primary install/run command: uvx sparkrun setup installs sparkrun and launches a guided setup wizard.
Prerequisites: NVIDIA DGX Spark systems are the target hardware. The tool automatically detects ConnectX-7 NICs and configures InfiniBand/RDMA.
Links: Documentation, Quick Start, Recipes, Spark Arena.

Highlighted Details

Supports multiple inference runtimes: vLLM, SGLang, llama.cpp.
Enables multi-node tensor parallelism with automatic InfiniBand/RDMA detection.
Features VRAM estimation to check model fit before launch (sparkrun show <recipe>).
Utilizes Git-based recipe registries for model configurations and benchmarks.
Includes a guided setup wizard for cluster, SSH, and system configuration.
Automates model and container distribution across cluster nodes via SSH.
Integrates with Spark Arena, a community hub for LLM benchmarks on DGX Spark.

Maintenance & Community

The project is sponsored and appears to be actively maintained, with official recipes hosted on GitHub. The Spark Arena platform serves as a community hub for sharing recipes and benchmark results.

Licensing & Compatibility

Licensed under the Apache License 2.0. This license is permissive and generally compatible with commercial use and linking within closed-source projects.

Limitations & Caveats

This tool is specifically designed for NVIDIA DGX Spark hardware and infrastructure. It deliberately abstracts away standard cluster schedulers like Slurm and container orchestrators like Kubernetes, which may be a limitation for environments not utilizing DGX Spark systems or requiring finer-grained control offered by those tools.

sparkrun by spark-arena

Explore Similar Projects

mini-infer by psmarter

ScaleLLM by vectorch-ai

amd-strix-halo-vllm-toolboxes by kyuz0

club-3090 by noonghunna

vllm-turboquant by mitkox

sglang-jax by sgl-project

nndeploy by nndeploy

distributed-llama by b4rtaz

lorax by predibase

spark-vllm-docker by eugr

aibrix by vllm-project

dynamo by ai-dynamo