NL2SQL_Handbook  by HKUSTDial

NL2SQL handbook for tracking text-to-SQL techniques

Created 1 year ago
971 stars

Top 38.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a comprehensive, continuously updated handbook for Natural Language to SQL (NL2SQL) techniques, targeting researchers and practitioners. It aims to provide practical guidance and track the latest advancements, particularly in the era of Large Language Models (LLMs), offering a structured overview of the field's evolution, challenges, and solutions.

How It Works

The handbook categorizes NL2SQL methods into Pre-processing, Translation, and Post-processing modules, detailing how LLMs are integrated into each stage. It traces the evolution of NL2SQL solutions through four stages, analyzing changes in target users and addressed challenges. The project also provides a "river diagram" to visualize the historical development of NL2SQL techniques.

Quick Start & Requirements

Highlighted Details

  • Comprehensive catalog of over 100 NL2SQL papers, categorized by module (Pre-processing, Translation, Post-processing, Benchmark, Evaluation, Error Analysis).
  • Detailed analysis of NL2SQL challenges, categorized into five levels, with a focus on LLM-era advancements.
  • Practical guides for novices, including data acquisition, LLM model building (fine-tuning, in-context learning), and model evaluation.
  • A "river diagram" illustrating the evolution of NL2SQL methods.

Maintenance & Community

The project is associated with HKUSTDial and the authors of the survey paper "A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?". No specific community channels (like Discord/Slack) or active maintenance signals are explicitly mentioned in the README.

Licensing & Compatibility

The repository itself does not specify a license. The BibTeX entry indicates the survey paper is an arXiv preprint. Linked repositories have their own licenses (e.g., Apache 2.0 for LitGPT). Compatibility for commercial use would depend on the licenses of the linked tools and the specific NL2SQL methods discussed.

Limitations & Caveats

This repository is a survey and handbook, not a runnable NL2SQL system. Users must refer to external, linked repositories for implementation details and tools. The "continuously updated" nature implies that the landscape of LLM-based NL2SQL is rapidly evolving, and the handbook reflects a snapshot in time.

Health Check
Last Commit

2 days ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
87 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Andreas Jansson Andreas Jansson(Cofounder of Replicate).

natural-sql by cfahlgren1

0%
866
Text-to-SQL LLMs with strong performance
Created 1 year ago
Updated 1 year ago
Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Andrew Kane Andrew Kane(Author of pgvector), and
8 more.

awesome-nlp by keon

0.1%
18k
Curated list of NLP resources
Created 9 years ago
Updated 5 days ago
Feedback? Help us improve.