Discover and explore top open-source AI tools and projects—updated daily.
glacier-creative-gitEnhancing RAG with novel semantic graph traversal
Top 96.5% on SourcePulse
This research addresses limitations in traditional Retrieval Augmented Generation (RAG) systems by proposing novel semantic similarity graph (SSG) traversal algorithms. It aims to provide LLMs with more precise and accurate context by intelligently navigating knowledge bases, benefiting engineers and researchers developing advanced RAG applications.
How It Works
The project constructs a hierarchical SSG where nodes represent document chunks or sentences, connected by cosine similarity edges. Seven distinct traversal algorithms are introduced, including LLM-guided and triangulation-based methods, to navigate this graph. These algorithms move beyond simple vector similarity matching to identify and extract highly relevant information, enhancing RAG system accuracy.
Quick Start & Requirements
Installation involves cloning the repository, setting up a Python 3.12 virtual environment, and installing dependencies via pip install -r requirements.txt. Running the interactive Jupyter notebook research_demonstration.ipynb is recommended for exploration. Prerequisites include Python 3.12 and Ollama, with optional environment variables for various LLM APIs.
Highlighted Details
llm-guided-traversal and triangulation_fulldim, which demonstrated strong performance in benchmarks.intra_document, theme_based, sequential_multi_hop) for generating challenging synthetic datasets for evaluation.query_traversal achieved a 100% winrate on one dataset, while others showed trade-offs between precision and recall.Maintenance & Community
The repository presents research findings and code under an MIT license. While it builds upon prior work like Microsoft's GraphRAG and Xiaomi's SSG-Retriever, explicit details on ongoing maintenance, community channels, or a roadmap are not provided in the README.
Licensing & Compatibility
The project is released under the permissive MIT license, allowing for free use, modification, and distribution, including for commercial purposes. Teams and organizations are encouraged to fork and build upon the research.
Limitations & Caveats
The ssg_traversal algorithm shows underperformance due to its query-agnostic nature. Triangulation algorithms, while precise, may require further fine-tuning to match recall benchmarks. The LLM-guided traversal offers accuracy at the cost of speed and computational resources. The research is positioned as a foundational step with potential for further exploration.
2 months ago
Inactive
NirDiamant