Agentar-Scale-SQL  by antgroup

Text-to-SQL framework advancing performance via scalable computation

Created 4 months ago
335 stars

Top 82.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Agentar-Scale-SQL is a novel framework addressing Text-to-SQL performance challenges through scalable computation and orchestrated test-time scaling. It targets researchers and engineers seeking to improve SQL generation accuracy, offering a significant leap towards human-expert performance on complex benchmarks.

How It Works

The framework employs an "Orchestrated Test-Time Scaling" strategy, synergistically combining three distinct perspectives. This approach aims to enhance accuracy by leveraging scalable computation, bridging the gap between state-of-the-art models and human expert capabilities on challenging Text-to-SQL tasks.

Quick Start & Requirements

Installation involves creating a Python 3.10 Conda environment, installing specific PyTorch (CUDA 12.1) and vLLM versions, and then project dependencies via requirements.txt. A Java environment is needed for DDL schema generation. Data preparation requires configuring dataset paths and column meaning files. Key links include the paper arXiv, BIRD Leaderboard (#1), Hugging Face (Models), and ModelScope (Models).

Highlighted Details

  • Achieved #1 rank on the BIRD leaderboard with 81.67% execution accuracy.
  • Outperforms leading methods like AskData + GPT-4o on key metrics (EX (Test), R-VES).
  • Open-sourced code for the Light Schema Engine and Offline Data Preprocessing Pipeline.
  • Released the Agentar-Scale-SQL-Generation-32B model.

Maintenance & Community

A detailed roadmap outlines plans for releasing more models and code modules, including Task Understanding, SQL Candidate Generation, and SQL Selection. No direct community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

The repository's license is not explicitly stated in the README, posing a significant adoption blocker for commercial or sensitive use cases. Compatibility notes for closed-source linking are absent.

Limitations & Caveats

Significant portions of the framework's code, specifically for Task Understanding, SQL Candidate Generation (ICL and Reasoning), and the SQL Selection module, are yet to be open-sourced according to the roadmap. This indicates the project is in active development with incomplete public code releases.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
5
Star History
39 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Andreas Jansson Andreas Jansson(Cofounder of Replicate).

natural-sql by cfahlgren1

0%
867
Text-to-SQL LLMs with strong performance
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.