SAG  by Zleap-AI

SQL-driven RAG engine for dynamic knowledge graph construction

Created 3 weeks ago

New!

489 stars

Top 63.1% on SourcePulse

GitHubView on GitHub
Project Summary

This project addresses the challenge of enabling machines to "understand" and "relate" vast amounts of text data without the overhead of maintaining large, static knowledge graphs. SAG is a SQL-driven RAG engine that dynamically constructs knowledge graphs during query execution. It targets developers, enterprise technical teams, and researchers interested in advanced RAG and GraphRAG techniques, offering precise information recall and full traceability.

How It Works

SAG employs an event-centric architecture, atomizing documents into discrete, semantically complete "events" and extracting multi-dimensional "natural language vectors" (entities) for each. Instead of pre-building a graph, it dynamically constructs relationship networks at query time. This is powered by a three-stage search: entity-driven Recall, multi-hop BFS Expand, and a weighted PageRank-based Rerank, combining SQL retrieval, vector search, and graph traversal for nuanced understanding.

Quick Start & Requirements

  • Installation: Recommended via Docker Compose (docker compose up -d).
  • Prerequisites: Python 3.11+, MySQL, Elasticsearch/VecDB (implied by storage architecture), LLM API Key. Requires downloading NLTK data (scripts/download_nltk_data.py).
  • Access: Frontend available at http://localhost:3000, API documentation at http://localhost/api/docs.
  • Links: GitHub Repository, Zleap.ai (for full version demo).

Highlighted Details

  • Event Atomization: Transforms documents into discrete, semantically complete "events" rather than fixed text chunks.
  • Dynamic Graph Construction: Builds knowledge graph relationships on-the-fly during query execution, avoiding static graph maintenance.
  • Three-Stage Search: Combines entity recall, multi-hop BFS expansion, and PageRank-based reranking for comprehensive and accurate retrieval.
  • Explainable Results: Outputs detailed JSON with event scores and traceable "clues" from the search pipeline, enhancing interpretability.
  • Flexible Entity System: Supports default 5W1H entities (Time, Location, Person, Topic, Action, Tags) and allows custom entity type definitions for domain adaptation.

Maintenance & Community

Maintained by Zleap.AI, with compute support from 302.AI. Community engagement is encouraged via their Discord channel and Twitter handle (@ZleapAI). Standard contribution guidelines are provided for developers wishing to participate.

Licensing & Compatibility

Licensed under the Apache-2.0 License, which is permissive and allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The open-source version provides the core engine but omits advanced features such as automatic web scraping, multi-source ingestion, content publishing, team collaboration, and cloud services available in the commercial offering. Production deployment may require significant computational resources due to the integrated database and LLM components.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
517 stars in the last 23 days

Explore Similar Projects

Feedback? Help us improve.