vanna  by vanna-ai

Python RAG framework for SQL generation

created 2 years ago
19,674 stars

Top 2.3% on sourcepulse

GitHubView on GitHub
Project Summary

Vanna is an open-source Python RAG framework designed to enable users to interact with SQL databases using natural language. It targets data analysts, engineers, and business users who need to query data without writing SQL, offering accurate text-to-SQL generation and visualization capabilities.

How It Works

Vanna employs a Retrieval-Augmented Generation (RAG) approach. It first "trains" a model by ingesting database schema (DDL), documentation, and existing SQL queries. This information is stored in a vector database. When a user asks a question, Vanna retrieves relevant schema and documentation context from the vector store and uses it to prompt a chosen Large Language Model (LLM) to generate an accurate SQL query. This RAG method is advantageous for portability across LLMs, ease of data updates, cost-effectiveness compared to fine-tuning, and future-proofing.

Quick Start & Requirements

  • Install: pip install vanna
  • Prerequisites: Requires API keys for chosen LLMs and vector stores. Optional packages for specific integrations are available.
  • Resources: A Colab notebook is available for a quick feel.
  • Links: Documentation, Colab Notebook

Highlighted Details

  • Supports a wide range of LLMs (OpenAI, Anthropic, Gemini, HuggingFace, Ollama, etc.) and vector stores (ChromaDB, PineCone, Qdrant, Weaviate, etc.).
  • Connects to numerous SQL databases including PostgreSQL, MySQL, Snowflake, BigQuery, and SQL Server.
  • Offers multiple UI integrations: Jupyter Notebook, Streamlit, Flask, and Slack.
  • Claims high accuracy on complex datasets, with self-learning capabilities to improve future query generation based on user feedback.

Maintenance & Community

  • Active development with community support available via Discord.
  • Links: Discord

Licensing & Compatibility

  • Licensed under the MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

  • Accuracy is directly tied to the quality and quantity of training data provided.
  • While flexible, setting up specific LLM and vector store integrations may require additional configuration.
Health Check
Last commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
12
Issues (30d)
7
Star History
2,474 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

LightRAG by HKUDS

1.0%
19k
RAG framework for fast, simple retrieval-augmented generation
created 10 months ago
updated 22 hours ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
7 more.

mindsdb by mindsdb

0.5%
35k
AI query engine for federated data sources
created 7 years ago
updated 1 day ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Anton Troynikov Anton Troynikov(Cofounder of Chroma), and
20 more.

llama_index by run-llama

0.3%
43k
Data framework for building LLM-powered agents
created 2 years ago
updated 23 hours ago
Feedback? Help us improve.