DuckDB-NSQL  by NumbersStationAI

Text-to-SQL model for local DuckDB analytics

created 1 year ago
306 stars

Top 88.6% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides DuckDB-NSQL, a foundational model (FM) specifically designed for generating SQL queries for local DuckDB analytics. It targets data analysts and engineers who need to interact with DuckDB databases using natural language, offering a streamlined way to query data without extensive SQL knowledge.

How It Works

DuckDB-NSQL is an autoregressive language model trained on a dataset of synthetically generated DuckDB SQL queries and transpiled text-to-SQL pairs. It leverages the NSQL family of models, optimized for SQL generation. The model can be run locally using llama.cpp for efficient inference, allowing direct integration with DuckDB connections for natural language querying.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python, llama.cpp (for hosting the model), DuckDB. Model weights available on HuggingFace (e.g., motherduckdb/DuckDB-NSQL-7B-v0.1-GGUF).
  • Usage: Examples provided in the examples/ folder demonstrate connecting to DuckDB and querying data.
  • Links: HuggingFace, Examples

Highlighted Details

  • 7B parameter model available in multiple formats (e.g., GGUF).
  • Trained on 200k synthetic DuckDB SQL queries and NSText2SQL data transpiled to DuckDB SQL.
  • Supports querying local DuckDB databases via natural language.
  • Includes evaluation benchmarks and scripts in the eval/ folder.

Maintenance & Community

  • Developed by NumbersStationAI.
  • Model weights hosted by motherduckdb.

Licensing & Compatibility

  • License details are not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial adoption. The model is presented as a foundational model for DuckDB SQL analytics, implying potential limitations in handling highly complex or niche SQL constructs not covered in its training data.

Health Check
Last commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Andreas Jansson Andreas Jansson(Cofounder of Replicate).

natural-sql by cfahlgren1

0.1%
866
Text-to-SQL LLMs with strong performance
created 1 year ago
updated 1 year ago
Feedback? Help us improve.