GraphGPT  by HKUDS

Graph instruction tuning research paper for LLMs

Created 1 year ago
780 stars

Top 44.9% on SourcePulse

GitHubView on GitHub
Project Summary

GraphGPT is a framework for aligning Large Language Models (LLMs) with graph structural knowledge through a dual-stage graph instruction tuning paradigm. It targets researchers and practitioners working with graph-structured data who want to leverage LLMs for tasks like node classification and link prediction, enabling LLMs to understand and reason about graph properties.

How It Works

GraphGPT employs a text-graph grounding paradigm to encode graph structures into the LLM's natural language space. This involves a graph transformer that is pre-trained to align textual descriptions with graph structures. The core innovation is a dual-stage instruction tuning process: first, self-supervised tuning on general graph instructions, followed by task-specific tuning (e.g., node classification, link prediction) using Chain-of-Thought (CoT) distillation for improved reasoning.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies using pip install -r requirements.txt.
  • Prerequisites: PyTorch 2.1+ with CUDA 11.8 (or 11.7 for older versions), torch_geometric and related packages, fschat for Vicuna base model.
  • Data/Models: Requires downloading Vicuna weights, a pre-trained graph transformer, and graph data (all_graph_data.pt).
  • Training: Two main scripts (graphgpt_stage1.sh, graphgpt_stage2.sh) orchestrate the two-stage tuning process.
  • Resources: Training can be performed on multiple GPUs (e.g., 2x 3090 for lightweight tuning).
  • Links: Huggingface models, Paper.

Highlighted Details

  • SIGIR'24 accepted paper.
  • Supports efficient, lightweight training on consumer GPUs.
  • Integrates graph transformers with LLMs via a learned projector.
  • Utilizes Chain-of-Thought (CoT) distillation for enhanced reasoning.

Maintenance & Community

  • Active development with recent updates for PyTorch 2.1+ compatibility.
  • FAQ and issue tracking available on GitHub.
  • Contact: Jiabin Tang.
  • Links: GitHub Stars, Star History.

Licensing & Compatibility

  • The repository itself appears to be under a permissive license, but the underlying Vicuna model and other dependencies may have their own licenses. Specific license details for GraphGPT are not explicitly stated in the README.

Limitations & Caveats

  • Requires specific versions of PyTorch and CUDA, with potential compatibility issues for older setups.
  • The README mentions potential compatibility issues with flash attention and suggests workarounds.
  • Training requires significant computational resources and careful setup of base models and data.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), and
3 more.

Alpaca-CoT by PhoebusSi

0.1%
3k
IFT platform for instruction collection, parameter-efficient methods, and LLMs
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.