sqlframe  by eakmanrq

DataFrame API for running PySpark code on various database engines

created 1 year ago
415 stars

Top 71.7% on sourcepulse

GitHubView on GitHub
Project Summary

SQLFrame provides a PySpark-compatible DataFrame API for executing data transformations directly on various SQL database engines, eliminating the need for Spark clusters. It targets users who want to leverage their existing database's processing power, run PySpark code locally without Spark overhead, or generate SQL representations of their DataFrame logic for debugging and sharing.

How It Works

SQLFrame translates PySpark DataFrame operations into SQL queries tailored for specific database backends. It supports multiple engines like BigQuery, Databricks, DuckDB, PostgreSQL, Snowflake, and Spark, with Redshift in development. A "Standalone" session can generate SQL without connecting to a database. The library allows customization of SQL dialects for input and output, and can optionally integrate with OpenAI for enhanced SQL generation.

Quick Start & Requirements

Install with pip install "sqlframe[<engine>]" (e.g., sqlframe[bigquery], sqlframe[duckdb]) or conda install -c conda-forge sqlframe. Specific engine documentation may have additional setup instructions.

Highlighted Details

  • PySpark DataFrame API compatibility for direct execution on SQL databases.
  • Supports BigQuery, Databricks, DuckDB, PostgreSQL, Snowflake, Spark, and Redshift (in development).
  • "Standalone" mode for SQL generation without a database connection.
  • Optional OpenAI integration for SQL optimization and readability.

Maintenance & Community

No specific community links or contributor information are provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The Redshift engine is noted as being in development with lacking test coverage and documentation. The README does not specify a license, which may impact commercial adoption.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
16
Issues (30d)
7
Star History
31 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.