suql  by stanford-oval

Research paper for conversational search over hybrid data sources

Created 1 year ago
281 stars

Top 92.7% on SourcePulse

GitHubView on GitHub
Project Summary

SUQL (Structured and Unstructured Query Language) enables conversational search over hybrid datasets, combining structured (SQL) and unstructured (text) data. It's designed for developers building chatbots and agents that need to query complex knowledge bases, offering a precise and expressive language for hybrid data access.

How It Works

SUQL integrates retrieval models, LLMs, and traditional SQL to provide a unified interface. It leverages dense vector indexing (via FAISS) for efficient unstructured data retrieval and SQL for structured data. Key optimizations minimize expensive LLM calls, and it supports complex SQL operations like JOINs and GROUP BYs, making it scalable to large PostgreSQL databases.

Quick Start & Requirements

  • Installation: Available via pip (pip install suql) or from source. Detailed instructions are in install_pip.md and install_source.md.
  • Prerequisites: Python, PostgreSQL, FAISS. Specific versions are not detailed in the README.
  • Resources: Requires a PostgreSQL database and potentially significant computational resources for LLM interactions and vector indexing.
  • Demos/Docs: Online demo available at https://yelpbot.genie.stanford.edu. Paper: https://arxiv.org/abs/2311.09818.

Highlighted Details

  • Extends SQL with free-text primitives like SUMMARY and ANSWER for seamless integration.
  • Optimized to reduce LLM inference costs.
  • Scalable to large databases, demonstrated with PostgreSQL.
  • Achieves strong performance on the HybridQA dataset and a custom Yelp dataset for conversational agents.

Maintenance & Community

The project originates from Stanford University. Contributions are welcomed via Issues and Pull Requests. Further details on best practices for agents are in conv_agent.md.

Licensing & Compatibility

The README does not explicitly state the license. Given its Stanford origin and academic publication, it is likely to be permissive (e.g., MIT, Apache 2.0), but this requires verification.

Limitations & Caveats

The README does not detail specific limitations, unsupported platforms, or known bugs beyond a known_issues.md file. The performance claims are based on specific datasets and may vary in real-world applications.

Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Andreas Jansson Andreas Jansson(Cofounder of Replicate).

natural-sql by cfahlgren1

0%
866
Text-to-SQL LLMs with strong performance
Created 1 year ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
12 more.

mindsdb by mindsdb

0.3%
36k
AI query engine for federated data sources
Created 7 years ago
Updated 15 hours ago
Feedback? Help us improve.