sparkrun  by spark-arena

LLM inference management for NVIDIA DGX Spark systems

Created 3 months ago
263 stars

Top 96.8% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides sparkrun, a command-line tool designed to simplify the launching, management, and stopping of Large Language Model (LLM) inference workloads specifically on NVIDIA DGX Spark systems. It aims to eliminate the complexity associated with traditional cluster management tools like Slurm or Kubernetes, offering a streamlined experience for users focused on running LLM inference.

How It Works

sparkrun employs a unified command-line interface to manage LLM workloads across one or more DGX Spark nodes. Its core approach leverages multi-runtime support, integrating seamlessly with popular inference engines such as vLLM, SGLang, and llama.cpp. The tool facilitates multi-node tensor parallelism, automatically detecting and configuring InfiniBand/RDMA networking for efficient distributed inference. Workloads are defined using a Git-based recipe registry system, allowing users to easily access, share, and manage model configurations, including official, community, and custom benchmarks. A guided setup wizard automates the initial cluster configuration, including SSH mesh setup, network detection, and resource management daemon configuration.

Quick Start & Requirements

  • Primary install/run command: uvx sparkrun setup installs sparkrun and launches a guided setup wizard.
  • Prerequisites: NVIDIA DGX Spark systems are the target hardware. The tool automatically detects ConnectX-7 NICs and configures InfiniBand/RDMA.
  • Links: Documentation, Quick Start, Recipes, Spark Arena.

Highlighted Details

  • Supports multiple inference runtimes: vLLM, SGLang, llama.cpp.
  • Enables multi-node tensor parallelism with automatic InfiniBand/RDMA detection.
  • Features VRAM estimation to check model fit before launch (sparkrun show <recipe>).
  • Utilizes Git-based recipe registries for model configurations and benchmarks.
  • Includes a guided setup wizard for cluster, SSH, and system configuration.
  • Automates model and container distribution across cluster nodes via SSH.
  • Integrates with Spark Arena, a community hub for LLM benchmarks on DGX Spark.

Maintenance & Community

The project is sponsored and appears to be actively maintained, with official recipes hosted on GitHub. The Spark Arena platform serves as a community hub for sharing recipes and benchmark results.

Licensing & Compatibility

Licensed under the Apache License 2.0. This license is permissive and generally compatible with commercial use and linking within closed-source projects.

Limitations & Caveats

This tool is specifically designed for NVIDIA DGX Spark hardware and infrastructure. It deliberately abstracts away standard cluster schedulers like Slurm and container orchestrators like Kubernetes, which may be a limitation for environments not utilizing DGX Spark systems or requiring finer-grained control offered by those tools.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
20
Issues (30d)
16
Star History
110 stars in the last 30 days

Explore Similar Projects

Starred by Matthew Johnson Matthew Johnson(Coauthor of JAX; Research Scientist at Google Brain), Roy Frostig Roy Frostig(Coauthor of JAX; Research Scientist at Google DeepMind), and
3 more.

sglang-jax by sgl-project

0.7%
275
High-performance LLM inference engine for JAX/TPU serving
Created 10 months ago
Updated 10 hours ago
Starred by Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
8 more.

lorax by predibase

0.1%
4k
Multi-LoRA inference server for serving 1000s of fine-tuned LLMs
Created 2 years ago
Updated 1 week ago
Feedback? Help us improve.