GraphGPT by HKUDS

Graph instruction tuning research paper for LLMs

Created 2 years ago

799 stars

Top 44.2% on SourcePulse

Project Summary

GraphGPT is a framework for aligning Large Language Models (LLMs) with graph structural knowledge through a dual-stage graph instruction tuning paradigm. It targets researchers and practitioners working with graph-structured data who want to leverage LLMs for tasks like node classification and link prediction, enabling LLMs to understand and reason about graph properties.

How It Works

GraphGPT employs a text-graph grounding paradigm to encode graph structures into the LLM's natural language space. This involves a graph transformer that is pre-trained to align textual descriptions with graph structures. The core innovation is a dual-stage instruction tuning process: first, self-supervised tuning on general graph instructions, followed by task-specific tuning (e.g., node classification, link prediction) using Chain-of-Thought (CoT) distillation for improved reasoning.

Quick Start & Requirements

Install: Clone the repository and install dependencies using pip install -r requirements.txt.
Prerequisites: PyTorch 2.1+ with CUDA 11.8 (or 11.7 for older versions), torch_geometric and related packages, fschat for Vicuna base model.
Data/Models: Requires downloading Vicuna weights, a pre-trained graph transformer, and graph data (all_graph_data.pt).
Training: Two main scripts (graphgpt_stage1.sh, graphgpt_stage2.sh) orchestrate the two-stage tuning process.
Resources: Training can be performed on multiple GPUs (e.g., 2x 3090 for lightweight tuning).
Links: Huggingface models, Paper.

Highlighted Details

SIGIR'24 accepted paper.
Supports efficient, lightweight training on consumer GPUs.
Integrates graph transformers with LLMs via a learned projector.
Utilizes Chain-of-Thought (CoT) distillation for enhanced reasoning.

Maintenance & Community

Active development with recent updates for PyTorch 2.1+ compatibility.
FAQ and issue tracking available on GitHub.
Contact: Jiabin Tang.
Links: GitHub Stars, Star History.

Licensing & Compatibility

The repository itself appears to be under a permissive license, but the underlying Vicuna model and other dependencies may have their own licenses. Specific license details for GraphGPT are not explicitly stated in the README.

Limitations & Caveats

Requires specific versions of PyTorch and CUDA, with potential compatibility issues for older setups.
The README mentions potential compatibility issues with flash attention and suggests workarounds.
Training requires significant computational resources and careful setup of base models and data.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

8 stars in the last 30 days

Explore Similar Projects

awesome-large-graph-model by THUMNLab

Paper list for large graph models

Created 2 years ago

Updated 1 year ago

Graph_Toolformer by jwzhanggy

LLMs enhanced for graph reasoning

Created 2 years ago

Updated 2 years ago

h2o-wizardlm by h2oai

Open-source tool for LLM fine-tuning dataset creation

Created 2 years ago

Updated 1 year ago

InstructGLM by agiresearch

Instruction-tuned graph language model

Created 2 years ago

Updated 10 months ago

Graph-CoT by PeterGriffinJin

Research paper augmenting LLMs via graph reasoning

Created 1 year ago

Updated 1 year ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

ChatGLM-LLaMA-chinese-insturct by 27182812

Fine-tuning exploration for ChatGLM, LLaMA on Chinese instruction data

Created 2 years ago

Updated 2 years ago

TAPE by XiaoxinHe

Enhanced graph representation learning via LLM-LM interpretation

Created 3 years ago

Updated 9 months ago

Awesome-LLMs-in-Graph-tasks by yhLeeee

Paper collection on LLMs in graph tasks

Created 2 years ago

Updated 9 months ago

Starred by

Casper Hansen

Casper Hansen(Author of AutoAWQ),

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai), and

6 more.

airoboros by jondurbin

Self-instruct tool for LLM finetuning

Created 2 years ago

Updated 1 year ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

Awesome-Language-Model-on-Graphs by PeterGriffinJin

Curated list of papers/resources for LLMs on graphs (survey)

Created 2 years ago

Updated 10 months ago

Awesome-Graph-LLM by XiaoxinHe

Collection of resources on Graph-Related Large Language Models (LLMs)

Created 2 years ago

Updated 2 months ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen), and

3 more.

Alpaca-CoT by PhoebusSi

IFT platform for instruction collection, parameter-efficient methods, and LLMs

Created 2 years ago

Updated 2 years ago

Feedback? Help us improve.