sqlframe  by eakmanrq

DataFrame API for running PySpark code on various database engines

Created 1 year ago
430 stars

Top 69.0% on SourcePulse

GitHubView on GitHub
Project Summary

SQLFrame provides a PySpark-compatible DataFrame API for executing data transformations directly on various SQL database engines, eliminating the need for Spark clusters. It targets users who want to leverage their existing database's processing power, run PySpark code locally without Spark overhead, or generate SQL representations of their DataFrame logic for debugging and sharing.

How It Works

SQLFrame translates PySpark DataFrame operations into SQL queries tailored for specific database backends. It supports multiple engines like BigQuery, Databricks, DuckDB, PostgreSQL, Snowflake, and Spark, with Redshift in development. A "Standalone" session can generate SQL without connecting to a database. The library allows customization of SQL dialects for input and output, and can optionally integrate with OpenAI for enhanced SQL generation.

Quick Start & Requirements

Install with pip install "sqlframe[<engine>]" (e.g., sqlframe[bigquery], sqlframe[duckdb]) or conda install -c conda-forge sqlframe. Specific engine documentation may have additional setup instructions.

Highlighted Details

  • PySpark DataFrame API compatibility for direct execution on SQL databases.
  • Supports BigQuery, Databricks, DuckDB, PostgreSQL, Snowflake, Spark, and Redshift (in development).
  • "Standalone" mode for SQL generation without a database connection.
  • Optional OpenAI integration for SQL optimization and readability.

Maintenance & Community

No specific community links or contributor information are provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The Redshift engine is noted as being in development with lacking test coverage and documentation. The README does not specify a license, which may impact commercial adoption.

Health Check
Last Commit

22 hours ago

Responsiveness

1 day

Pull Requests (30d)
26
Issues (30d)
18
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Travis Fischer Travis Fischer(Founder of Agentic), and
1 more.

vanna by vanna-ai

0.4%
20k
Python RAG framework for SQL generation
Created 2 years ago
Updated 5 months ago
Feedback? Help us improve.