Agentar-Scale-SQL by antgroup

Text-to-SQL framework advancing performance via scalable computation

Created 5 months ago

371 stars

Top 76.7% on SourcePulse

Project Summary

Summary

Agentar-Scale-SQL is a novel framework addressing Text-to-SQL performance challenges through scalable computation and orchestrated test-time scaling. It targets researchers and engineers seeking to improve SQL generation accuracy, offering a significant leap towards human-expert performance on complex benchmarks.

How It Works

The framework employs an "Orchestrated Test-Time Scaling" strategy, synergistically combining three distinct perspectives. This approach aims to enhance accuracy by leveraging scalable computation, bridging the gap between state-of-the-art models and human expert capabilities on challenging Text-to-SQL tasks.

Quick Start & Requirements

Installation involves creating a Python 3.10 Conda environment, installing specific PyTorch (CUDA 12.1) and vLLM versions, and then project dependencies via requirements.txt. A Java environment is needed for DDL schema generation. Data preparation requires configuring dataset paths and column meaning files. Key links include the paper arXiv, BIRD Leaderboard (#1), Hugging Face (Models), and ModelScope (Models).

Highlighted Details

Achieved #1 rank on the BIRD leaderboard with 81.67% execution accuracy.
Outperforms leading methods like AskData + GPT-4o on key metrics (EX (Test), R-VES).
Open-sourced code for the Light Schema Engine and Offline Data Preprocessing Pipeline.
Released the Agentar-Scale-SQL-Generation-32B model.

Maintenance & Community

A detailed roadmap outlines plans for releasing more models and code modules, including Task Understanding, SQL Candidate Generation, and SQL Selection. No direct community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

The repository's license is not explicitly stated in the README, posing a significant adoption blocker for commercial or sensitive use cases. Compatibility notes for closed-source linking are absent.

Limitations & Caveats

Significant portions of the framework's code, specifically for Task Understanding, SQL Candidate Generation (ICL and Reasoning), and the SQL Selection module, are yet to be open-sourced according to the roadmap. This indicates the project is in active development with incomplete public code releases.

Agentar-Scale-SQL by antgroup

Explore Similar Projects

BIRD-Interact by bird-bench

natural-sql by cfahlgren1

DuckDB-NSQL by NumbersStationAI

NSQL by NumbersStationAI

Table-Pretraining by microsoft

universal-db-mcp by Anarkh-Lee

XiYan-SQL by XGenerationLab

LLM-Text-to-SQL-Architectures by arunpshankar

CHESS by ShayanTalaei

TAG-Bench by TAG-Research

Awesome-LLM-based-Text2SQL by DEEP-PolyU

Spider2 by xlang-ai