MindSQL  by Mindinventory

Python RAG library for database interaction via natural language

created 1 year ago
405 stars

Top 72.8% on sourcepulse

GitHubView on GitHub
Project Summary

MindSQL is a Python library designed to simplify database interactions for developers and data analysts by enabling natural language queries. It leverages Retrieval-Augmented Generation (RAG) with large language models (LLMs) to translate user questions into SQL, supporting a wide range of databases and vector stores.

How It Works

MindSQL employs a RAG architecture, indexing database schema (DDL) and example question-SQL pairs into a vector store (ChromaDB, Faiss). When a user asks a question, the system retrieves relevant schema and examples from the vector store to provide context to an LLM (GPT-4, Llama 2, Gemini). The LLM then generates an SQL query, which is executed against the specified database (PostgreSQL, MySQL, SQLite, Snowflake, BigQuery). This approach aims to improve query accuracy and reduce the need for users to know SQL syntax.

Quick Start & Requirements

  • Install via pip: pip install mindsql
  • Requires Python 3.10 or higher.
  • Supports PostgreSQL, MySQL, SQLite, Snowflake, and BigQuery.
  • Integrates with ChromaDB and Faiss for vector storage.
  • Utilizes LLMs like GPT-4, Llama 2, and Google Gemini.
  • Official documentation and usage examples are available in the README.

Highlighted Details

  • Supports multiple SQL databases including cloud platforms like Snowflake and BigQuery.
  • Integrates with popular vector stores for efficient context retrieval.
  • Offers visualization of query results as charts.
  • Allows indexing of both database schema and example question-SQL pairs.

Maintenance & Community

The project appears to be actively maintained by Mindinventory. Contribution guidelines, bug reporting, and feature requests are clearly outlined in the README, encouraging community involvement.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The library is presented as a RAG solution, which can be sensitive to the quality of indexed data and the performance of the underlying LLM. Specific performance benchmarks or detailed error handling strategies are not provided in the README.

Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
4
Star History
42 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Travis Fischer Travis Fischer(Founder of Agentic), and
1 more.

vanna by vanna-ai

0.5%
20k
Python RAG framework for SQL generation
created 2 years ago
updated 3 months ago
Feedback? Help us improve.