suql  by stanford-oval

Research paper for conversational search over hybrid data sources

created 1 year ago
277 stars

Top 94.5% on sourcepulse

GitHubView on GitHub
Project Summary

SUQL (Structured and Unstructured Query Language) enables conversational search over hybrid datasets, combining structured (SQL) and unstructured (text) data. It's designed for developers building chatbots and agents that need to query complex knowledge bases, offering a precise and expressive language for hybrid data access.

How It Works

SUQL integrates retrieval models, LLMs, and traditional SQL to provide a unified interface. It leverages dense vector indexing (via FAISS) for efficient unstructured data retrieval and SQL for structured data. Key optimizations minimize expensive LLM calls, and it supports complex SQL operations like JOINs and GROUP BYs, making it scalable to large PostgreSQL databases.

Quick Start & Requirements

  • Installation: Available via pip (pip install suql) or from source. Detailed instructions are in install_pip.md and install_source.md.
  • Prerequisites: Python, PostgreSQL, FAISS. Specific versions are not detailed in the README.
  • Resources: Requires a PostgreSQL database and potentially significant computational resources for LLM interactions and vector indexing.
  • Demos/Docs: Online demo available at https://yelpbot.genie.stanford.edu. Paper: https://arxiv.org/abs/2311.09818.

Highlighted Details

  • Extends SQL with free-text primitives like SUMMARY and ANSWER for seamless integration.
  • Optimized to reduce LLM inference costs.
  • Scalable to large databases, demonstrated with PostgreSQL.
  • Achieves strong performance on the HybridQA dataset and a custom Yelp dataset for conversational agents.

Maintenance & Community

The project originates from Stanford University. Contributions are welcomed via Issues and Pull Requests. Further details on best practices for agents are in conv_agent.md.

Licensing & Compatibility

The README does not explicitly state the license. Given its Stanford origin and academic publication, it is likely to be permissive (e.g., MIT, Apache 2.0), but this requires verification.

Limitations & Caveats

The README does not detail specific limitations, unsupported platforms, or known bugs beyond a known_issues.md file. The performance claims are based on specific datasets and may vary in real-world applications.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.