LLM-Text-to-SQL-Architectures  by arunpshankar

LLM-powered Text-to-SQL architectures

Created 2 years ago
253 stars

Top 99.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a collection of architectural patterns for leveraging Large Language Models (LLMs) to efficiently generate SQL queries from natural language text. It targets engineers and researchers seeking to streamline database interactions by translating complex natural language questions into executable SQL, with a specific focus on enhancing BigQuery capabilities. The project offers practical implementations and a guide to various LLM-driven approaches for robust and performant Text-to-SQL generation.

How It Works

The project explores five distinct architectural patterns for Text-to-SQL. These include using LLMs for intent detection and entity extraction, integrating Retrieval-Augmented Generation (RAG) with schema metadata for context-aware query formulation, and employing autonomous SQL agents with iterative refinement capabilities. Advanced patterns focus on direct schema inference coupled with self-correction mechanisms that utilize execution feedback to resolve errors, and a stochastic optimization approach that selects the fastest executing query from multiple trials. This multi-pattern approach aims to enhance accuracy, robustness, and performance in LLM-based SQL generation.

Quick Start & Requirements

  • Install:
    git clone https://github.com/arunpshankar/LLM-Text-to-SQL-Architectures.git
    cd LLM-Text-to-SQL-Architectures
    python3 -m venv .venv
    source .venv/bin/activate
    pip install -r requirements.txt
    
  • Prerequisites: Python 3, virtual environment. Specific dependencies are detailed in requirements.txt. Pattern III requires an ODBC connection for BigQuery. Pattern IV mentions the Code-Chat Bison model.
  • Links: Further details are available in an accompanying Medium article (title: "Architectural Patterns for Text-to-SQL: Leveraging LLMs for Enhanced BigQuery Interactions"), CONTRIBUTING.md, and LICENSE.md.

Highlighted Details

  • Focuses on practical application within BigQuery environments.
  • Implements five distinct LLM-based Text-to-SQL architectural patterns.
  • Features advanced techniques like Retrieval-Augmented Generation (RAG), autonomous SQL agents, and self-correcting query generation.
  • Explores the use of the Code-Chat Bison model for potential cost and latency optimizations.

Maintenance & Community

Guidelines for contributions are available in CONTRIBUTING.md. The repository does not explicitly list community channels (e.g., Discord, Slack) or notable maintainers/sponsors in the provided README snippet.

Licensing & Compatibility

The project is licensed under the MIT License, which permits broad use, including commercial applications, with minimal restrictions beyond attribution.

Limitations & Caveats

A section on "Challenges and Limitations" is noted as "In Progress," indicating that potential pitfalls, areas for improvement, and known issues are documented but may be incomplete or primarily detailed in the linked external Medium article. Specific limitations are not detailed within the README snippet itself.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Andreas Jansson Andreas Jansson(Cofounder of Replicate).

natural-sql by cfahlgren1

0%
866
Text-to-SQL LLMs with strong performance
Created 1 year ago
Updated 1 year ago
Feedback? Help us improve.