similarity-graph-traversal-semantic-rag-research  by glacier-creative-git

Enhancing RAG with novel semantic graph traversal

Created 2 months ago
265 stars

Top 96.5% on SourcePulse

GitHubView on GitHub
Project Summary

This research addresses limitations in traditional Retrieval Augmented Generation (RAG) systems by proposing novel semantic similarity graph (SSG) traversal algorithms. It aims to provide LLMs with more precise and accurate context by intelligently navigating knowledge bases, benefiting engineers and researchers developing advanced RAG applications.

How It Works

The project constructs a hierarchical SSG where nodes represent document chunks or sentences, connected by cosine similarity edges. Seven distinct traversal algorithms are introduced, including LLM-guided and triangulation-based methods, to navigate this graph. These algorithms move beyond simple vector similarity matching to identify and extract highly relevant information, enhancing RAG system accuracy.

Quick Start & Requirements

Installation involves cloning the repository, setting up a Python 3.12 virtual environment, and installing dependencies via pip install -r requirements.txt. Running the interactive Jupyter notebook research_demonstration.ipynb is recommended for exploration. Prerequisites include Python 3.12 and Ollama, with optional environment variables for various LLM APIs.

Highlighted Details

  • Introduces seven novel SSG traversal algorithms, including llm-guided-traversal and triangulation_fulldim, which demonstrated strong performance in benchmarks.
  • Features a hierarchical SSG architecture with document, chunk, and sentence levels, incorporating thematic properties.
  • Developed custom context grouping algorithms (intra_document, theme_based, sequential_multi_hop) for generating challenging synthetic datasets for evaluation.
  • Benchmarking indicates query_traversal achieved a 100% winrate on one dataset, while others showed trade-offs between precision and recall.

Maintenance & Community

The repository presents research findings and code under an MIT license. While it builds upon prior work like Microsoft's GraphRAG and Xiaomi's SSG-Retriever, explicit details on ongoing maintenance, community channels, or a roadmap are not provided in the README.

Licensing & Compatibility

The project is released under the permissive MIT license, allowing for free use, modification, and distribution, including for commercial purposes. Teams and organizations are encouraged to fork and build upon the research.

Limitations & Caveats

The ssg_traversal algorithm shows underperformance due to its query-agnostic nature. Triangulation algorithms, while precise, may require further fine-tuning to match recall benchmarks. The LLM-guided traversal offers accuracy at the cost of speed and computational resources. The research is positioned as a foundational step with potential for further exploration.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

RAG-Anything by HKUDS

1.8%
12k
All-in-one multimodal RAG system
Created 7 months ago
Updated 5 days ago
Feedback? Help us improve.