spark-jobserver  by spark-jobserver

REST API for Apache Spark job management

created 11 years ago
2,843 stars

Top 17.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a RESTful API for managing and executing Apache Spark jobs, offering a "Spark as a Service" solution. It targets developers and data engineers who need to submit, monitor, and manage Spark applications programmatically, enabling efficient resource utilization and job orchestration.

How It Works

Spark Job Server exposes a REST API that allows clients to upload job JARs, create and manage Spark contexts (either transient or persistent), and submit jobs for execution. It supports asynchronous and synchronous job submission, job status querying, and context management. The architecture is actor-based, promoting loose coupling and scalability.

Quick Start & Requirements

  • Install/Run: Docker is the recommended way to start. Alternatively, build and run locally via SBT (sbt package, then job-server-extras/reStart).
  • Prerequisites: SBT, Java, Apache Spark. Docker image includes Spark.
  • Links: Troubleshooting, EC2 Deploy, EMR Deploy

Highlighted Details

  • Supports Scala 2.11 and 2.12.
  • Offers both old and new SparkJob APIs, with the new API being more type-safe.
  • Includes features like Named Objects (RDDs, DataFrames) for inter-job sharing and caching.
  • Provides authentication (Shiro, Keycloak) and SSL/TLS support.

Maintenance & Community

  • Original project from Ooyala, now the main development repo.
  • Users include Netflix, Datastax, and KNIME.
  • Community discussions via Google Groups: spark-jobserver.
  • Bug reports and issues: GitHub Issues.

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

  • The "Context per JVM" feature is experimental and has known issues with context shutdown.
  • Some tests may fail on Windows.
  • Release binaries were deleted due to Bintray sunset; only recent releases are available on JFrog.
Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

spark-nlp by JohnSnowLabs

0.1%
4k
NLP library for scalable ML pipelines
created 7 years ago
updated 1 day ago
Feedback? Help us improve.