DuckDB-NSQL  by NumbersStationAI

Text-to-SQL model for local DuckDB analytics

Created 1 year ago
315 stars

Top 85.6% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides DuckDB-NSQL, a foundational model (FM) specifically designed for generating SQL queries for local DuckDB analytics. It targets data analysts and engineers who need to interact with DuckDB databases using natural language, offering a streamlined way to query data without extensive SQL knowledge.

How It Works

DuckDB-NSQL is an autoregressive language model trained on a dataset of synthetically generated DuckDB SQL queries and transpiled text-to-SQL pairs. It leverages the NSQL family of models, optimized for SQL generation. The model can be run locally using llama.cpp for efficient inference, allowing direct integration with DuckDB connections for natural language querying.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python, llama.cpp (for hosting the model), DuckDB. Model weights available on HuggingFace (e.g., motherduckdb/DuckDB-NSQL-7B-v0.1-GGUF).
  • Usage: Examples provided in the examples/ folder demonstrate connecting to DuckDB and querying data.
  • Links: HuggingFace, Examples

Highlighted Details

  • 7B parameter model available in multiple formats (e.g., GGUF).
  • Trained on 200k synthetic DuckDB SQL queries and NSText2SQL data transpiled to DuckDB SQL.
  • Supports querying local DuckDB databases via natural language.
  • Includes evaluation benchmarks and scripts in the eval/ folder.

Maintenance & Community

  • Developed by NumbersStationAI.
  • Model weights hosted by motherduckdb.

Licensing & Compatibility

  • License details are not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial adoption. The model is presented as a foundational model for DuckDB SQL analytics, implying potential limitations in handling highly complex or niche SQL constructs not covered in its training data.

Health Check
Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Andreas Jansson Andreas Jansson(Cofounder of Replicate).

natural-sql by cfahlgren1

0%
866
Text-to-SQL LLMs with strong performance
Created 1 year ago
Updated 1 year ago
Feedback? Help us improve.