sail  by lakehq

Computation framework unifying data processing and AI workloads

created 1 year ago
865 stars

Top 42.3% on sourcepulse

GitHubView on GitHub
Project Summary

Sail is a computation framework designed to unify batch processing, stream processing, and AI workloads, offering a drop-in replacement for Spark SQL and the Spark DataFrame API. It targets data engineers and AI practitioners seeking to streamline complex data pipelines and improve performance.

How It Works

Sail acts as a Spark Connect server, enabling existing PySpark applications to connect to a Sail backend without code modifications. This approach leverages the familiar Spark API while introducing performance optimizations and potentially reducing infrastructure costs.

Quick Start & Requirements

  • Install: pip install "pysail[spark]"
  • Prerequisites: Python. Installation from source is recommended for optimized performance.
  • Getting Started: Connect to a running Sail server using SparkSession.builder.remote("sc://localhost:50051").getOrCreate().
  • Documentation: https://docs.lake.sail.io/

Highlighted Details

  • Claims up to 4x speed improvement and 94% cost reduction compared to standard Spark in benchmarks.
  • Supports both single-host and distributed deployments, including Kubernetes.
  • Offers an MCP server for integrating Spark data analytics with LLM agents.

Maintenance & Community

  • Active development with contributions welcomed via GitHub issues, discussions, and pull requests.
  • Enterprise support is available from LakeSail.

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as a replacement for Spark SQL and DataFrame API, implying a dependency on the Spark ecosystem. Specific limitations regarding supported Spark versions or feature parity are not detailed in the README.

Health Check
Last commit

18 hours ago

Responsiveness

1 day

Pull Requests (30d)
74
Issues (30d)
35
Star History
142 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

bytewax by bytewax

0.3%
2k
Python framework for stateful stream processing
created 3 years ago
updated 4 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

spark-nlp by JohnSnowLabs

0.1%
4k
NLP library for scalable ML pipelines
created 7 years ago
updated 1 day ago
Feedback? Help us improve.