LLMRec by HKUDS

Recommender system enhanced via LLM graph augmentation (WSDM'24 paper)

Created 2 years ago

501 stars

Top 62.1% on SourcePulse

Project Summary

LLMRec introduces a novel framework for recommendation systems by leveraging Large Language Models (LLMs) to augment interaction graphs. It targets researchers and practitioners in recommendation systems seeking to enhance model performance by incorporating rich textual and multi-modal content. The primary benefit is improved recommendation accuracy through LLM-driven graph enrichment.

How It Works

LLMRec enhances recommendation models by applying three LLM-based graph augmentation strategies: reinforcing user-item interaction edges, enriching item node attributes with LLM-generated text, and creating user profiles from interaction history. This approach intuitively leverages natural language to capture nuanced relationships and user preferences, offering a more comprehensive understanding of the recommendation landscape compared to traditional methods.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python, PyTorch. LLM augmentation stages require API access or pre-generated data.
Usage:
1. LLM Augmentation: python ./gpt_ui_aug.py, python ./gpt_user_profiling.py, python ./gpt_i_attribute_generate_aug.py
2. Training: python ./main.py --dataset {netflix, movielens}
Data: Pre-processed multi-modal datasets (Netflix, MovieLens) with LLM-augmented text and embeddings are available for download.
Links: Netflix Dataset

Highlighted Details

Implements LLM-based augmentation for user-item edges, item attributes, and user profiles.
Provides multi-modal datasets (text, images) for Netflix and MovieLens.
Utilizes CLIP-ViT and Sentence-BERT for visual and textual feature encoding.
Codebase is structured based on MMSSL, LATTICE, and MICRO.

Maintenance & Community

The project is associated with the University of Hong Kong and Baidu Inc. The repository was last updated in March 2024.

Licensing & Compatibility

The repository does not explicitly state a license. The provided datasets are for research purposes, with a specific request to cite the paper if the 'netflix' dataset is used.

Limitations & Caveats

The LLM augmentation stages may require significant API costs or computational resources if run directly. The provided code for baselines (LATTICE, MMSSL) requires minor modifications for dataset path adjustments.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days