G-Retriever by XiaoxinHe

Research paper implementation for graph-based question answering

Created 2 years ago

522 stars

Top 60.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Travis Fischer

Founder of Agentic

Project Summary

G-Retriever is a question-answering framework designed for textual graph understanding and question answering on real-world graphs. It targets researchers and practitioners in areas like scene graph understanding, common sense reasoning, and knowledge graph reasoning, offering enhanced graph comprehension through a novel integration of GNNs, LLMs, and RAG.

How It Works

G-Retriever combines Graph Neural Networks (GNNs) for graph representation, Large Language Models (LLMs) for generation, and Retrieval-Augmented Generation (RAG) for context. This hybrid approach leverages soft prompting to fine-tune the LLM, enabling it to better understand and reason over graph structures, leading to improved accuracy in question answering tasks.

Quick Start & Requirements

Install: Requires PyTorch 2.0.1 with CUDA 11.8, PyG libraries, peft, pandas, ogb, transformers, wandb, sentencepiece, datasets, pcst_fast, gensim, scipy==1.12, and protobuf.
LLM: Access to Llama 2 (7b-hf) via Hugging Face, requiring a Hugging Face account and access token.
Data Preprocessing: Commands provided for expla_graphs, scene_graphs, and webqsp datasets.
Training: Scripts for inference-only LLM, frozen LLM with prompt tuning, fine-tuned LLM with LoRA, and G-Retriever with LoRA.
Reproducibility: A run.sh script is available for reproducing paper results.

Highlighted Details

Integrates GNNs, LLMs, and RAG for textual graph QA.
Supports fine-tuning via soft prompting for enhanced graph understanding.
Official implementation for NeurIPS 2024 paper "G-Retriever".
PyG 2.6 compatibility noted.

Maintenance & Community

NeurIPS 2024 publication.
No explicit community links (Discord/Slack) or roadmap mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. The repository name and structure suggest it's intended for research purposes. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The setup requires specific versions of PyTorch and CUDA, and access to Llama 2 models which necessitates a Hugging Face account and token. The README does not detail performance benchmarks beyond the paper's claims or provide extensive documentation beyond setup and training commands.

Health Check

Last Commit

9 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days