vanna  by vanna-ai

Python RAG framework for SQL generation

Created 2 years ago
20,395 stars

Top 2.2% on SourcePulse

GitHubView on GitHub
Project Summary

Vanna is an open-source Python RAG framework designed to enable users to interact with SQL databases using natural language. It targets data analysts, engineers, and business users who need to query data without writing SQL, offering accurate text-to-SQL generation and visualization capabilities.

How It Works

Vanna employs a Retrieval-Augmented Generation (RAG) approach. It first "trains" a model by ingesting database schema (DDL), documentation, and existing SQL queries. This information is stored in a vector database. When a user asks a question, Vanna retrieves relevant schema and documentation context from the vector store and uses it to prompt a chosen Large Language Model (LLM) to generate an accurate SQL query. This RAG method is advantageous for portability across LLMs, ease of data updates, cost-effectiveness compared to fine-tuning, and future-proofing.

Quick Start & Requirements

  • Install: pip install vanna
  • Prerequisites: Requires API keys for chosen LLMs and vector stores. Optional packages for specific integrations are available.
  • Resources: A Colab notebook is available for a quick feel.
  • Links: Documentation, Colab Notebook

Highlighted Details

  • Supports a wide range of LLMs (OpenAI, Anthropic, Gemini, HuggingFace, Ollama, etc.) and vector stores (ChromaDB, PineCone, Qdrant, Weaviate, etc.).
  • Connects to numerous SQL databases including PostgreSQL, MySQL, Snowflake, BigQuery, and SQL Server.
  • Offers multiple UI integrations: Jupyter Notebook, Streamlit, Flask, and Slack.
  • Claims high accuracy on complex datasets, with self-learning capabilities to improve future query generation based on user feedback.

Maintenance & Community

  • Active development with community support available via Discord.
  • Links: Discord

Licensing & Compatibility

  • Licensed under the MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

  • Accuracy is directly tied to the quality and quantity of training data provided.
  • While flexible, setting up specific LLM and vector store integrations may require additional configuration.
Health Check
Last Commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
6
Star History
506 stars in the last 30 days

Explore Similar Projects

Starred by Mike Krieger Mike Krieger(CPO at Anthropic; Cofounder of Instagram), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
25 more.

redis by redis

0.1%
71k
Redis is a versatile data structure server, cache, and query engine
Created 16 years ago
Updated 3 days ago
Feedback? Help us improve.