sqlova  by naver

Semantic parser for translating natural language to SQL queries

created 6 years ago
645 stars

Top 52.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

SQLova is a neural semantic parser that translates natural language utterances into SQL queries, targeting researchers and developers working with structured data. It achieves state-of-the-art performance on the WikiSQL benchmark by leveraging BERT embeddings and an execution-guided decoding strategy.

How It Works

SQLova employs a BERT-based, table- and context-aware word-embedding approach. The core is a sequence-to-SQL model, building upon the architecture of SQLNet, which uses column attention and a sequence-to-set structure. The SQLova-EG variant incorporates execution-guided decoding, which refines the generated SQL query by checking its executability against the database schema, leading to improved accuracy.

Quick Start & Requirements

  • Install: Requires Python 3.6+, PyTorch 0.4.0+, and specific Python libraries (babel, matplotlib, defusedxml, tqdm). CUDA 9.0 is recommended for GPU acceleration.
  • Data: Download the WikiSQL dataset and place it in $HOME/data/WikiSQL-1.1/data.
  • BERT: Pre-trained BERT parameters need conversion from TensorFlow to PyTorch format. Pre-converted files are available.
  • Training: Run python3 train.py with various arguments to control batch size, learning rate, and BERT fine-tuning. Training on a Tesla M40 GPU takes approximately 12 hours for ~79% logical accuracy.
  • Docs: arXiv Manuscript

Highlighted Details

  • Achieved 83.6% logical form accuracy and 89.6% execution accuracy on the WikiSQL test set with the SQLova-EG model.
  • Utilizes BERT for enhanced natural language understanding and table context awareness.
  • Implements execution-guided decoding for improved SQL query generation accuracy.
  • Codebase builds upon and significantly rewrites the SQLNet architecture.

Maintenance & Community

Developed by Clova AI Research, NAVER Corp. The project appears to be research-oriented with a focus on the NeurIPS 2019 conference. No community links (Discord, Slack) are provided in the README.

Licensing & Compatibility

Licensed under the Apache License, Version 2.0. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The project is research-focused and may not be actively maintained. The dependencies (PyTorch 0.4.0) are quite old, potentially requiring significant effort to update for compatibility with modern environments. The README mentions compatibility issues with newer versions of pytorch-pretrained-BERT.

Health Check
Last commit

5 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.