gfm-rag  by RManLuo

Advanced RAG powered by Graph Foundation Models

Created 1 year ago
261 stars

Top 97.2% on SourcePulse

GitHubView on GitHub
Project Summary

Graph Foundation Model for Retrieval Augmented Generation (GFM-RAG) pioneers Retrieval Augmented Generation (RAG) using Graph Foundation Models (GFMs), integrating Graph Neural Networks (GNNs) for reasoning over structured knowledge. It addresses complex question answering by enabling efficient multi-hop retrieval, offering researchers and engineers a powerful tool for knowledge-intensive tasks.

How It Works

The pipeline constructs a "universal graph index" from documents to capture relational knowledge. A pre-trained GFM retriever, based on GNNs, reasons over this graph for relevant document retrieval. This approach facilitates efficient, single-step multi-hop reasoning, a key advantage over traditional RAG. The GFM retriever is designed for generalizability, applicable to unseen datasets without fine-tuning.

Quick Start & Requirements

Installation requires Python 3.12 and CUDA 12+ (12.6.3 recommended), managed via Conda for CUDA toolkit installation. The gfmrag package is installed via pip. Full documentation is available at https://rmanluo.github.io/gfm-rag/.

Highlighted Details

  • GFM Retriever: GNN-based for reasoning over graph-indexed knowledge.
  • Universal Graph Index: Supports diverse knowledge structures (KGs, Document Graphs, Hierarchical Graphs).
  • Efficiency: Multi-hop reasoning via single-step retrieval.
  • Generalizability & Transferability: Works on unseen datasets; supports domain-specific fine-tuning.
  • Compatibility: Integrates with arbitrary agent-based frameworks.
  • Interpretability: Illustrates captured reasoning paths.

Maintenance & Community

Recent activity includes the release of the G-reasoner codebase (34M model) and acceptances to ICLR 2026 and NeurIPS 2025. A new GFM-RAG version (pre-trained on 286 KGs) was released in June 2025. No community links are provided.

Licensing & Compatibility

The README omits specific license information, posing a significant adoption risk, especially for commercial use. The framework is compatible with arbitrary agent-based systems.

Limitations & Caveats

The lack of a stated license is the primary adoption blocker. While generalizable, optimal performance on niche domains may require fine-tuning. Setup necessitates specific CUDA versions. The project's research focus, indicated by recent conference acceptances, suggests it may not be production-ready.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
11 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.