Awesome-Text2SQL by eosphoros-ai

Curated list of Text2SQL resources for LLMs, DSLs, APIs, and visualization

Created 2 years ago

3,503 stars

Top 13.7% on SourcePulse

Project Summary

This repository is a curated collection of resources, tutorials, and benchmarks for Text-to-SQL and related Natural Language Interface to Database (NLIDB) tasks. It targets researchers and practitioners in the NLP and database communities, providing a comprehensive overview of the field's advancements, models, datasets, and evaluation metrics.

How It Works

The project serves as a central hub for the Text-to-SQL ecosystem, categorizing and linking to key research papers, foundational LLMs (like Llama, Mistral, Qwen), fine-tuning techniques (LoRA, QLoRA, RLHF), and benchmark datasets (WikiSQL, Spider, BIRD-SQL). It highlights the evolution of Text-to-SQL from traditional methods to LLM-driven approaches, emphasizing performance metrics like Exact Match (EM) and Execution Accuracy (EX).

Quick Start & Requirements

This is a curated list of resources, not a runnable software package. To utilize the information, users will need to access the linked papers, code repositories, and datasets, which may have their own specific requirements (e.g., Python, deep learning frameworks, specific hardware for running LLMs).

Highlighted Details

Comprehensive leaderboards for major Text-to-SQL benchmarks (WikiSQL, Spider, BIRD) showcasing state-of-the-art performance with detailed model and date information.
Extensive lists of foundational LLMs, fine-tuning methods, and datasets relevant to Text-to-SQL, including recent advancements and their associated papers and code.
Detailed explanations of evaluation metrics like Execution Accuracy (EX) and Exact Match (EM), crucial for understanding model performance.
Includes practical projects and libraries like DB-GPT-Hub and MindSQL, offering hands-on tools for Text-to-SQL development.

Maintenance & Community

The repository is maintained by the eosphoros-ai organization, with a clear invitation for community contributions. Links to related projects like Awesome-AIGC-Tutorials and the organization's own focus on privacy-preserving LLM solutions are provided.

Licensing & Compatibility

The repository itself is a collection of links and information; the licensing of the linked resources (papers, code, datasets) varies and must be checked individually. This compilation is generally compatible with most research and commercial uses, provided the underlying linked resources permit it.

Limitations & Caveats

As a curated list, this repository does not provide a unified API or a single executable. Users must navigate and integrate the various linked resources independently. The rapidly evolving nature of LLMs means leaderboards and model performance can quickly become outdated.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

54 stars in the last 30 days