suql by stanford-oval

Research paper for conversational search over hybrid data sources

Created 2 years ago

294 stars

Top 90.1% on SourcePulse

Project Summary

SUQL (Structured and Unstructured Query Language) enables conversational search over hybrid datasets, combining structured (SQL) and unstructured (text) data. It's designed for developers building chatbots and agents that need to query complex knowledge bases, offering a precise and expressive language for hybrid data access.

How It Works

SUQL integrates retrieval models, LLMs, and traditional SQL to provide a unified interface. It leverages dense vector indexing (via FAISS) for efficient unstructured data retrieval and SQL for structured data. Key optimizations minimize expensive LLM calls, and it supports complex SQL operations like JOINs and GROUP BYs, making it scalable to large PostgreSQL databases.

Quick Start & Requirements

Installation: Available via pip (pip install suql) or from source. Detailed instructions are in install_pip.md and install_source.md.
Prerequisites: Python, PostgreSQL, FAISS. Specific versions are not detailed in the README.
Resources: Requires a PostgreSQL database and potentially significant computational resources for LLM interactions and vector indexing.
Demos/Docs: Online demo available at https://yelpbot.genie.stanford.edu. Paper: https://arxiv.org/abs/2311.09818.

Highlighted Details

Extends SQL with free-text primitives like SUMMARY and ANSWER for seamless integration.
Optimized to reduce LLM inference costs.
Scalable to large databases, demonstrated with PostgreSQL.
Achieves strong performance on the HybridQA dataset and a custom Yelp dataset for conversational agents.

Maintenance & Community

The project originates from Stanford University. Contributions are welcomed via Issues and Pull Requests. Further details on best practices for agents are in conv_agent.md.

Licensing & Compatibility

The README does not explicitly state the license. Given its Stanford origin and academic publication, it is likely to be permissive (e.g., MIT, Apache 2.0), but this requires verification.

Limitations & Caveats

The README does not detail specific limitations, unsupported platforms, or known bugs beyond a known_issues.md file. The performance claims are based on specific datasets and may vary in real-world applications.

suql by stanford-oval

Explore Similar Projects

natural-sql by cfahlgren1

DuckDB-NSQL by NumbersStationAI

XiYan-SQL by XGenerationLab

rookie_text2data by jaguarliuu

PathRAG by BUPT-GAMMA

TAG-Bench by TAG-Research

Awesome-LLM-based-Text2SQL by DEEP-PolyU

Spider2 by xlang-ai

llm-driven-data-engineering by DataExpert-io

supersonic by tencentmusic

WrenAI by Canner

mindsdb by mindsdb