dat  by hexinfo

Enterprise framework for natural language data querying

Created 10 months ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

DAT (Data Ask Tool) is an enterprise-grade AI framework enabling natural language querying of databases. It empowers business users to interact directly with data without SQL, ensuring accuracy through a pre-modeled semantic layer.

How It Works

DAT employs an "Askdata Agent workflow" prioritizing result quality. It uses LLMs for natural language understanding and semantic SQL generation, translating it into database-specific SQL. A key feature is its rich semantic modeling (entities, dimensions, measures) defined via YAML, which guides LLM precision. Vectorized retrieval enhances query understanding with stored knowledge.

Quick Start & Requirements

  • Requirements: Java 17+, a supported database (MySQL, PostgreSQL, Oracle, DuckDB), and an LLM API key (OpenAI, Anthropic, etc.).
  • Quick Start: Install DAT CLI. Initialize a project (dat init), configure dat_project.yaml (DB/LLM), define semantic models in YAML, and run queries via dat run or dat server openapi.

Highlighted Details

  • Enterprise Architecture: Pluggable SPI, modular design, factory pattern.
  • Multi-Database Support: Native MySQL, PostgreSQL, Oracle, DuckDB; extensible via SPI.
  • Semantic SQL Generation: LLM-driven NLU, SQL dialect conversion, semantic model binding.
  • Semantic Modeling: YAML-defined entities, dimensions (time, categorical), measures.
  • Vectorized Retrieval: Enhances queries with embeddings for SQL Q&A, synonyms, knowledge.
  • Flexible Deployment: CLI, OpenAPI service, MCP service for agent integration.

Maintenance & Community

Actively developed, with a detailed "Development Items List" outlining planned features. Community engagement via GitHub Discussions and WeChat.

Licensing & Compatibility

Licensed under Apache 2.0, permissive for commercial use and closed-source integration.

Limitations & Caveats

Key features under development include IDE plugins (VSCode, IDEA, Eclipse), LLM-assisted semantic model generation, comprehensive testing for core querying, and Jinja templating for data permissions. The project is at version 0.7.2, indicating ongoing development.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
12 more.

minds-platform by mindsdb

0.1%
39k
AI query engine for federated data sources
Created 7 years ago
Updated 2 days ago
Feedback? Help us improve.