MAC-SQL  by wbbeyourself

Research paper for multi-agent collaborative text-to-SQL framework

created 1 year ago
271 stars

Top 95.8% on sourcepulse

GitHubView on GitHub
Project Summary

MAC-SQL is a multi-agent framework designed to tackle the Text-to-SQL problem, enabling more accurate and robust SQL query generation from natural language. It is targeted at researchers and developers working on natural language understanding and database interaction, offering a collaborative approach to improve performance on complex queries.

How It Works

The framework employs a three-agent collaborative system: a Selector, a Decomposer, and a Refiner. This architecture allows for specialized processing of natural language questions and database schemas. The Selector identifies relevant database tables and columns, the Decomposer breaks down complex questions into simpler SQL sub-queries, and the Refiner synthesizes these into a final, executable SQL query. This modular design aims to improve accuracy and handle intricate query structures more effectively than monolithic approaches.

Quick Start & Requirements

  • Install: conda create -n macsql python=3.9 -y, conda activate macsql, pip install -r requirements.txt, python -c "import nltk; nltk.download('punkt')"
  • Prerequisites: Python 3.9, NLTK punkt tokenizer, OpenAI API access (GPT-4-1106-preview default). Requires setting OPENAI_API_BASE and OPENAI_API_KEY environment variables.
  • Data: Download data.zip (BIRD and Spider datasets) from provided Baidu Disk or Google Drive links and replace the existing data folder.
  • Demo: Run scripts/app_bird.py or scripts/app_spider.py for SQL execution demos.
  • Docs: Official Paper (cited as COLING 2025).

Highlighted Details

  • Utilizes a multi-agent collaborative framework with Selector, Decomposer, and Refiner agents.
  • Supports evaluation on BIRD and Spider datasets using Execution Accuracy (EX) and Valid Efficiency Score (VES).
  • Offers integration with local LLMs (e.g., SQL-Llama) by uncommenting specific configurations.
  • Includes a bad_cases folder with examples of challenging queries.

Maintenance & Community

The project is associated with authors from various institutions and has been accepted to COLING 2025. No specific community channels like Discord or Slack are mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code is provided for research purposes, and commercial use or closed-source linking compatibility is not specified.

Limitations & Caveats

The framework relies heavily on OpenAI's API, specifically older versions (e.g., openai==0.28.1), and requires careful configuration of API keys and endpoints. The default model is GPT-4-1106-preview, and running with local models requires specific deployment steps.

Health Check
Last commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
33 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.