Awesome-LLM-based-Text2SQL  by DEEP-PolyU

Advancing LLM-based Text-to-SQL generation

Created 1 month ago
371 stars

Top 76.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a comprehensive, continuously updated catalog of resources for Large Language Model (LLM)-based Text-to-SQL. It targets researchers and practitioners in database interfaces and natural language processing, offering a curated collection of surveys, papers, benchmarks, datasets, and open-source projects to accelerate development and understanding in this rapidly evolving field. The project is anchored by a survey paper accepted by IEEE TKDE in 2025.

How It Works

The fundamental workflow involves an LLM processing a natural language question in conjunction with the relevant database schema. The LLM then generates an executable SQL query. This generated query is subsequently executed against the target database to retrieve the precise results needed to answer the user's original question, aiming to create more intuitive and powerful database interaction interfaces.

Quick Start & Requirements

This repository functions as a curated index and does not provide a direct installation or execution command. Users are directed to the linked papers and projects for specific setup instructions, dependencies (e.g., Python versions, GPU requirements, CUDA), and execution details. Key resources include links to numerous survey papers, prominent benchmarks like BIRD and Spider (versions 1.0 and 2.0), and a wide array of original and post-annotated datasets.

Highlighted Details

  • Extensive Resource Curation: Features a deep dive into LLM-based Text-to-SQL, cataloging over a dozen survey papers, multiple benchmark evaluations (including BIRD, Spider 1.0/2.0, BIRD-CRITIC, BIRD-INTERACT), and a comprehensive list of both original and post-annotated datasets.
  • Research Trends & Taxonomy: Tracks the field's evolution, highlighting the significant shift towards LLM-driven advancements since 2023. It organizes methodologies into In-context Learning and Fine-tuning approaches, detailing numerous related papers and projects within this structure.
  • Benchmark Performance Data: Provides tables with performance metrics (e.g., EX, Snow Score, Lite Score, SR, Reward) for leading models on benchmarks like BIRD and Spider, updated with recent (2023-2025) results.
  • Project Showcase: Lists and links to notable open-source projects and frameworks such as SQLGlot, DB-GPT, DB-GPT-Hub, Awesome-Text2SQL, and PremSQL.

Maintenance & Community

The repository is actively maintained and updated, with its foundation in a survey paper accepted by IEEE TKDE in 2025. Contributions are actively welcomed via GitHub issues and pull requests, fostering community engagement. Direct contact is available via email at zijin[dot]hong[at]connect[dot]polyu[dot]hk.

Licensing & Compatibility

The repository itself does not specify a license. Users are strongly advised to refer to the individual linked papers and projects for their respective licensing terms and compatibility restrictions, particularly concerning commercial use or integration into closed-source systems.

Limitations & Caveats

As a curated list, this repository does not offer direct functionality; it serves as an index and pointer to external research resources. The absence of an explicit license for the repository itself necessitates careful review of all linked external projects for their specific usage rights. The content is primarily research-oriented, with a focus on advancements reported up to late 2025.

Health Check
Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
368 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Andreas Jansson Andreas Jansson(Cofounder of Replicate).

natural-sql by cfahlgren1

0%
866
Text-to-SQL LLMs with strong performance
Created 1 year ago
Updated 1 year ago
Feedback? Help us improve.