GRID  by snap-research

Generative recommendation with semantic IDs

Created 3 months ago
300 stars

Top 88.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

GRID (Generative Recommendation with Semantic IDs) is a framework for generative recommendation systems that leverages semantic IDs derived from item text embeddings. It is designed for researchers and engineers working on recommendation systems, offering a novel approach to generating recommendations by converting item text into embeddings, learning hierarchical semantic IDs from these embeddings, and then using transformer models to generate recommendation sequences as these semantic ID tokens.

How It Works

GRID's approach involves three main steps: 1) Embedding Generation: Item text is converted into embeddings using any Huggingface-compatible LLM. 2) Semantic ID Learning: These item embeddings are then transformed into hierarchical semantic IDs using techniques like Residual Quantization (RQ-KMeans, RQ-VAE, RVQ). 3) Generative Recommendations: Transformer architectures are employed to generate recommendation sequences as sequences of these learned semantic ID tokens. This method aims to improve recommendation quality by capturing semantic relationships in a structured, tokenized format.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies using pip install -r requirements.txt.
  • Prerequisites: Python 3.10+, a CUDA-compatible GPU is recommended.
  • Data Preparation: Requires data in a specific format including user history (train/, validation/, test/) and item text (items/). Pre-processed Amazon data is available for download.
  • Setup: The process involves data preparation, embedding generation, semantic ID learning, and training the generative recommendation model. Specific commands are provided for each step.

Highlighted Details

  • Supports multiple methods for Semantic ID learning, including Residual K-means, Residual Vector Quantization, and RVQ with VAE.
  • Implements generative recommendation models such as TIGER.
  • Built with PyTorch and PyTorch Lightning, utilizing Hydra for configuration management.

Maintenance & Community

  • Developed by Snap Research.
  • Contact information for the development team (Clark Mingxuan Ju, Liam Collins, Leonardo Neves) is provided for questions and support. Issues can also be raised on GitHub.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.

Limitations & Caveats

  • The README mentions that num_hierarchies needs to be incremented by one for the generative recommendation model training and inference steps due to an appended digit for de-duplication, which could be a point of confusion.
Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
11
Star History
94 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
23 more.

sentence-transformers by UKPLab

0.3%
18k
Framework for text embeddings, retrieval, and reranking
Created 6 years ago
Updated 3 days ago
Feedback? Help us improve.