TAPE by XiaoxinHe

Enhanced graph representation learning via LLM-LM interpretation

Created 3 years ago

258 stars

Top 98.1% on SourcePulse

Project Summary

This repository provides the official implementation for the ICLR 2024 paper "Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning." It enables researchers and practitioners to enhance graph representation learning by leveraging explanations generated by Large Language Models (LLMs) as interpreters for text attributes. The project offers a framework for fine-tuning language models and training Graph Neural Networks (GNNs) with these enriched features, aiming to improve performance on text-attributed graph tasks.

How It Works

The core approach involves an LLM-to-LM interpreter that processes text attributes associated with graph nodes. This interpreter generates explanations, which are then used to fine-tune language models. These fine-tuned models produce enriched text representations that are integrated into GNN architectures. This method aims to capture deeper semantic understanding from text, leading to more effective graph representation learning compared to using raw text or standard embeddings alone.

Quick Start & Requirements

Primary install/run command: Environment setup via Conda is detailed.
Prerequisites: Python 3.8, PyTorch 1.12.1, Torchvision 0.13.1, Torchaudio 0.12.1, CUDA toolkit 11.3, PyTorch Geometric (sparse, scatter, cluster), DGL (cu113), OGB, yacs, transformers, accelerate.
Datasets: Requires downloading original text attributes and LLM responses for datasets like ogbn-arxiv, ogbn-products (subset), arxiv_2023, Cora, and PubMed. Links are provided within the README.
Setup: Detailed commands for environment setup, dataset preparation, LM fine-tuning, and GNN training are provided.
Links:
- Paper URL: https://openreview.net/forum?id=RXFVcynVe1
- Dataset Downloads: Links provided within the README for original text attributes and LLM responses.
- Checkpoints/Embeddings: https://huggingface.co/XiaoxinHe/TAPE/tree/main (implied by "download them here")
- arxiv-2023 dataset code: Provided within the repository.

Highlighted Details

Supports fine-tuning LMs using either original text attributes or LLM-generated responses.
Enables training GNNs with various architectures (MLP, GCN, SAGE, RevGAT) and feature types (TA_P_E, TA, E, P, ogb).
Provides pre-trained checkpoints and TAPE features for reproducibility.
Includes scripts for constructing and processing the arxiv-2023 dataset.

Maintenance & Community

Information regarding specific maintainers, community channels (like Discord/Slack), or a public roadmap is not explicitly detailed in the provided README. The project is associated with the ICLR 2024 paper.

Licensing & Compatibility

The license type is not specified in the provided README content. Compatibility for commercial use or closed-source linking cannot be determined without explicit licensing information.

Limitations & Caveats

The setup requires specific versions of PyTorch and CUDA (11.3), which might pose compatibility challenges with newer hardware or existing environments. The README does not detail any known bugs, alpha status, or unsupported platforms.

Health Check

Last Commit

9 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days