GraphGPT  by HKUDS

Graph instruction tuning research paper for LLMs

created 1 year ago
767 stars

Top 46.4% on sourcepulse

GitHubView on GitHub
Project Summary

GraphGPT is a framework for aligning Large Language Models (LLMs) with graph structural knowledge through a dual-stage graph instruction tuning paradigm. It targets researchers and practitioners working with graph-structured data who want to leverage LLMs for tasks like node classification and link prediction, enabling LLMs to understand and reason about graph properties.

How It Works

GraphGPT employs a text-graph grounding paradigm to encode graph structures into the LLM's natural language space. This involves a graph transformer that is pre-trained to align textual descriptions with graph structures. The core innovation is a dual-stage instruction tuning process: first, self-supervised tuning on general graph instructions, followed by task-specific tuning (e.g., node classification, link prediction) using Chain-of-Thought (CoT) distillation for improved reasoning.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies using pip install -r requirements.txt.
  • Prerequisites: PyTorch 2.1+ with CUDA 11.8 (or 11.7 for older versions), torch_geometric and related packages, fschat for Vicuna base model.
  • Data/Models: Requires downloading Vicuna weights, a pre-trained graph transformer, and graph data (all_graph_data.pt).
  • Training: Two main scripts (graphgpt_stage1.sh, graphgpt_stage2.sh) orchestrate the two-stage tuning process.
  • Resources: Training can be performed on multiple GPUs (e.g., 2x 3090 for lightweight tuning).
  • Links: Huggingface models, Paper.

Highlighted Details

  • SIGIR'24 accepted paper.
  • Supports efficient, lightweight training on consumer GPUs.
  • Integrates graph transformers with LLMs via a learned projector.
  • Utilizes Chain-of-Thought (CoT) distillation for enhanced reasoning.

Maintenance & Community

  • Active development with recent updates for PyTorch 2.1+ compatibility.
  • FAQ and issue tracking available on GitHub.
  • Contact: Jiabin Tang.
  • Links: GitHub Stars, Star History.

Licensing & Compatibility

  • The repository itself appears to be under a permissive license, but the underlying Vicuna model and other dependencies may have their own licenses. Specific license details for GraphGPT are not explicitly stated in the README.

Limitations & Caveats

  • Requires specific versions of PyTorch and CUDA, with potential compatibility issues for older setups.
  • The README mentions potential compatibility issues with flash attention and suggests workarounds.
  • Training requires significant computational resources and careful setup of base models and data.
Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
42 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.