Discover and explore top open-source AI tools and projects—updated daily.
ShayanTalaeiLLM-powered multi-agent framework for efficient SQL synthesis
Top 99.0% on SourcePulse
Contextual Harnessing for Efficient SQL Synthesis (CHESS) addresses the long-standing challenge of translating natural language questions into SQL queries, particularly when dealing with large database catalogs and ambiguous language. It offers an LLM-based multi-agent framework designed for efficient, scalable, and accurate SQL generation, targeting researchers and engineers in the text-to-SQL domain seeking robust industrial solutions.
How It Works
CHESS employs a modular, multi-agent architecture comprising four specialized agents: Information Retriever (IR) for data extraction, Schema Selector (SS) for pruning large schemas, Candidate Generator (CG) for iterative query refinement, and Unit Tester (UT) for LLM-based validation. This approach systematically tackles challenges like extensive database catalogs, schema reasoning, query validity, and natural language ambiguity. The Schema Selector agent is a key differentiator, significantly reducing LLM token usage by 5x while improving accuracy.
Quick Start & Requirements
Installation involves cloning the repository, creating a .env file with necessary API keys (OpenAI, Google Cloud) and configuration paths, and installing dependencies via pip install -r requirements.txt. A preprocessing step (sh run/run_preprocess.sh) is mandatory to generate database indexes (minhash, LSH, vector). Core execution commands include sh run/run_main_ir_cg_ut.sh or sh run/run_main_ir_ss_ch.sh.
Highlighted Details
Maintenance & Community
The provided README does not contain information regarding maintainers, community channels (e.g., Discord, Slack), or project roadmaps.
Licensing & Compatibility
The README does not specify the software license. This omission requires clarification for potential adoption, especially concerning commercial use or integration with closed-source systems.
Limitations & Caveats
The setup requires specific API keys (OpenAI, Google Cloud) and a multi-step preprocessing phase. Performance metrics are reported on the BIRD dataset, and the framework's applicability to other datasets or LLMs may require modifications as outlined in the run/langchain_utils.py file.
6 months ago
Inactive
cfahlgren1
defog-ai
Canner