Discover and explore top open-source AI tools and projects—updated daily.
distil-labsLocal Text-to-SQL for plain English data querying
Top 93.7% on SourcePulse
This project provides a fine-tuned, small language model (SLM) for converting natural language questions into executable SQL queries. It targets users who need to query data locally, ensuring privacy, offline capability, and avoiding cloud dependencies. The key benefit is enabling users to interact with their CSV data using plain English, achieving accuracy comparable to much larger cloud-based LLMs while running efficiently on local hardware.
How It Works
The core approach involves fine-tuning the Qwen3 family of small language models on a dataset of approximately 10,000 synthetic Text2SQL examples. This process specifically trains the model to translate natural language questions and database schemas into correct SQL syntax. The project highlights that off-the-shelf small models struggle with this task, necessitating fine-tuning. The advantage lies in achieving high accuracy (80% LLM-as-a-Judge, 60% Exact Match with the 4B model) with significantly smaller model sizes (4B or 0.6B parameters) compared to large, cloud-hosted models, enabling local execution and enhanced privacy.
Quick Start & Requirements
python -m venv .venv
. .venv/bin/activate
pip install huggingface_hub openai pandas
# Download the recommended 4-bit quantized model (~2.5GB)
huggingface-cli download distil-labs/distil-qwen3-4b-text2sql-gguf-4bit --local-dir distil-model
cd distil-model
ollama create distil-qwen3-4b-text2sql -f Modelfile
cd ..
python app.py --csv example_data/employees.csv --question "How many employees are in each department?"
pip packages (huggingface_hub, openai, pandas).distil-qwen3-4b-text2sql-gguf-4bit for local use.Highlighted Details
Maintenance & Community
No specific details regarding maintainers, community channels (like Discord/Slack), sponsorships, or roadmaps were found in the provided text.
Licensing & Compatibility
The specific open-source license for this project and its models is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking would require clarification of the underlying model and project licenses.
Limitations & Caveats
The model achieves approximately 80% accuracy, meaning roughly 1 in 5 generated SQL queries may require manual review or adjustment. Users are advised to always use the --show-sql flag to inspect generated queries before execution. The model generates SQLite-compatible SQL, and integration with other database systems may require manual adaptation of the SQL syntax.
2 months ago
Inactive