LinearRAG  by DEEP-PolyU

Efficient GraphRAG with relation-free graph construction

Created 2 months ago
293 stars

Top 90.2% on SourcePulse

GitHubView on GitHub
Project Summary

A relation-free graph construction method for efficient Graph Retrieval-Augmented Generation (GraphRAG) on large-scale corpora. It targets researchers and practitioners seeking faster, more scalable, and cost-effective RAG solutions by eliminating LLM token costs during graph building.

How It Works

The core innovation is a relation-free graph construction that relies on lightweight entity recognition and semantic linking for contextual comprehension. This approach bypasses explicit relational graphs, enabling deep retrieval via semantic bridging for multi-hop reasoning in a single pass. Key advantages include zero LLM token consumption during graph creation, leading to accelerated processing speeds and linear time/space complexity.

Quick Start & Requirements

  • Installation: Requires pip install -r requirements.txt, downloading a Spacy model (en_core_web_trf or en_core_sci_scibert), setting OPENAI_API_KEY and OPENAI_BASE_URL, cloning datasets from HuggingFace into dataset/, and placing an embedding model (e.g., model/all-mpnet-base-v2/).
  • Prerequisites: Python, specified dependencies, Spacy models, OpenAI API key, embedding model, datasets.
  • Example Run: A detailed bash command is provided using run.py with configurable Spacy model, embedding model, dataset, LLM, and worker count.
  • Links: Datasets available via HuggingFace.

Highlighted Details

  • Eliminates LLM token consumption during graph construction.
  • Achieves linear time and space complexity for high scalability.
  • Enables complex multi-hop reasoning via semantic bridging.
  • Offers faster processing speeds.

Maintenance & Community

Licensing & Compatibility

  • License: GNU General Public License v3.0 (GPL-3.0).
  • Compatibility: GPL-3.0 is a strong copyleft license, requiring adherence to its terms for derivative works, potentially impacting commercial use or closed-source integration.

Limitations & Caveats

Dependency on OpenAI API and specific embedding models may incur costs and limit model choice. The GPL-3.0 license imposes significant obligations, potentially restricting commercial adoption. Setup requires manual download and configuration of multiple external components.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
4
Star History
36 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.