MAC-SQL by wbbeyourself

Research paper for multi-agent collaborative text-to-SQL framework

Created 2 years ago

314 stars

Top 86.1% on SourcePulse

Project Summary

MAC-SQL is a multi-agent framework designed to tackle the Text-to-SQL problem, enabling more accurate and robust SQL query generation from natural language. It is targeted at researchers and developers working on natural language understanding and database interaction, offering a collaborative approach to improve performance on complex queries.

How It Works

The framework employs a three-agent collaborative system: a Selector, a Decomposer, and a Refiner. This architecture allows for specialized processing of natural language questions and database schemas. The Selector identifies relevant database tables and columns, the Decomposer breaks down complex questions into simpler SQL sub-queries, and the Refiner synthesizes these into a final, executable SQL query. This modular design aims to improve accuracy and handle intricate query structures more effectively than monolithic approaches.

Quick Start & Requirements

Install: conda create -n macsql python=3.9 -y, conda activate macsql, pip install -r requirements.txt, python -c "import nltk; nltk.download('punkt')"
Prerequisites: Python 3.9, NLTK punkt tokenizer, OpenAI API access (GPT-4-1106-preview default). Requires setting OPENAI_API_BASE and OPENAI_API_KEY environment variables.
Data: Download data.zip (BIRD and Spider datasets) from provided Baidu Disk or Google Drive links and replace the existing data folder.
Demo: Run scripts/app_bird.py or scripts/app_spider.py for SQL execution demos.
Docs: Official Paper (cited as COLING 2025).

Highlighted Details

Utilizes a multi-agent collaborative framework with Selector, Decomposer, and Refiner agents.
Supports evaluation on BIRD and Spider datasets using Execution Accuracy (EX) and Valid Efficiency Score (VES).
Offers integration with local LLMs (e.g., SQL-Llama) by uncommenting specific configurations.
Includes a bad_cases folder with examples of challenging queries.

Maintenance & Community

The project is associated with authors from various institutions and has been accepted to COLING 2025. No specific community channels like Discord or Slack are mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code is provided for research purposes, and commercial use or closed-source linking compatibility is not specified.

Limitations & Caveats

The framework relies heavily on OpenAI's API, specifically older versions (e.g., openai==0.28.1), and requires careful configuration of API keys and endpoints. The default model is GPT-4-1106-preview, and running with local models requires specific deployment steps.

MAC-SQL by wbbeyourself

Explore Similar Projects

CHESS by ShayanTalaei

openbench by groq

Spider2 by xlang-ai

KwaiAgents by KwaiKEG

jar3d_meta_expert by brainqub3

wren-engine by Canner

neuron-ai by neuron-core

spiceai by spiceai

multi-agent-postgres-data-analytics by disler

lagent by InternLM

langroid by langroid

superduper by superduper-io