sqlframe by eakmanrq

DataFrame API for running PySpark code on various database engines

Created 1 year ago

471 stars

Top 64.6% on SourcePulse

Project Summary

SQLFrame provides a PySpark-compatible DataFrame API for executing data transformations directly on various SQL database engines, eliminating the need for Spark clusters. It targets users who want to leverage their existing database's processing power, run PySpark code locally without Spark overhead, or generate SQL representations of their DataFrame logic for debugging and sharing.

How It Works

SQLFrame translates PySpark DataFrame operations into SQL queries tailored for specific database backends. It supports multiple engines like BigQuery, Databricks, DuckDB, PostgreSQL, Snowflake, and Spark, with Redshift in development. A "Standalone" session can generate SQL without connecting to a database. The library allows customization of SQL dialects for input and output, and can optionally integrate with OpenAI for enhanced SQL generation.

Quick Start & Requirements

Install with pip install "sqlframe[<engine>]" (e.g., sqlframe[bigquery], sqlframe[duckdb]) or conda install -c conda-forge sqlframe. Specific engine documentation may have additional setup instructions.

Highlighted Details

PySpark DataFrame API compatibility for direct execution on SQL databases.
Supports BigQuery, Databricks, DuckDB, PostgreSQL, Snowflake, Spark, and Redshift (in development).
"Standalone" mode for SQL generation without a database connection.
Optional OpenAI integration for SQL optimization and readability.

Maintenance & Community

No specific community links or contributor information are provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The Redshift engine is noted as being in development with lacking test coverage and documentation. The README does not specify a license, which may impact commercial adoption.

sqlframe by eakmanrq

Explore Similar Projects

pg_gpt by cloudquery

MindSQL by Mindinventory

sqlime by nalgeon

python-bigquery-dataframes by googleapis

slashbase-go by slashbase

chdb by chdb-io

aliyun-odps-python-sdk by aliyun

natural-language-postgres by vercel-labs

pyspark-ai by pyspark-ai

spiceai by spiceai

sql-explorer by explorerhq

vanna by vanna-ai