Discover and explore top open-source AI tools and projects—updated daily.
NumbersStationAIOpen-source Text-to-SQL foundation models for efficient database interaction
Top 98.8% on SourcePulse
Numbers Station AI's NSQL is a family of open-source, autoregressive large foundation models specifically engineered for Text-to-SQL tasks. It addresses the need for accurate and efficient conversion of natural language queries into executable SQL statements, targeting engineers, researchers, and power users working with relational databases. The models offer a range of sizes, from 350M to 7B parameters, providing flexibility for deployment scenarios, including local execution with enhanced privacy.
How It Works
NSQL models leverage an autoregressive architecture, a common approach for sequence generation tasks, but are specialized for SQL. This focus allows them to achieve high performance on complex query structures, including joins and nested subqueries, often outperforming larger, more general-purpose models. The models are designed to understand database schemas and translate user intents into syntactically correct and semantically accurate SQL queries.
Quick Start & Requirements
pip install -r requirements.txt.manifest library, and database connectors (examples provided for Postgres and SQLite). Model weights are available on HuggingFace.python3 -m manifest.api.app ... and then interact with it via a Python client, as demonstrated in the examples/ directory.Highlighted Details
Maintenance & Community
The project lists Vishal Motwani, Sen Wu, and Laurel Orr as contributors. No specific community channels (like Discord or Slack) or roadmap details are provided in the README.
Licensing & Compatibility
The code in this repository is licensed under the permissive Apache 2.0 license, which generally allows for commercial use. However, the datasets used for training NSQL models have diverse licenses, including CC-BY-4.0, MIT, Apache-2.0, BSD 3-Clause, and others. Users must adhere to the terms of these original dataset licenses, including any attribution requirements, which may impose restrictions on derived works or redistribution.
Limitations & Caveats
The primary caveat for adoption is the varied licensing of the training data; users must carefully review and comply with the original licenses of each dataset used in the NSText2SQL corpus. The README does not specify any known bugs, unsupported platforms, or deprecation plans.
3 days ago
Inactive
cfahlgren1
microsoft