GRRM  by Indolent-Kawhi

LLM framework for generative reasoning recommendations

Created 7 months ago
264 stars

Top 96.5% on SourcePulse

GitHubView on GitHub
Project Summary

Generative Reasoning Recommendation via LLMs (GREAM) addresses the challenges of applying Large Language Models (LLMs) to recommendation tasks, particularly the gap between textual semantics and collaborative filtering signals, and the issue of sparse user feedback. It introduces an end-to-end framework that unifies understanding, reasoning, and prediction for recommendation systems. The project is targeted at researchers and engineers seeking to enhance recommendation accuracy and reasoning capabilities using LLMs.

How It Works

GREAM integrates three key components: Collaborative–Semantic Alignment fuses heterogeneous textual evidence (titles, descriptions, reviews) to construct semantically consistent discrete item indices, aligning linguistic and interaction semantics. Reasoning Curriculum Activation builds a synthetic Chain-of-Thought (CoT) dataset and trains via a progressive curriculum covering behavioral evidence extraction, latent preference modeling, intent inference, and recommendation formulation. Sparse-Regularized Group Policy Optimization (SRPO) is a novel reinforcement learning method combining Residual-Sensitive Verifiable Reward (RSVR) and Bonus-Calibrated Group Advantage Estimation (BGAE) for stable and verifiable fine-tuning under sparse signals.

Quick Start & Requirements

  • Installation: Execute bash scripts/install.sh to set up the environment.
  • Prerequisites: Requires LLaMA-Factory for Supervised Fine-Tuning (SFT). Data preparation involves downloading and unzipping specific files (data.zip, sft_data.zip). Model training involves scripts/construct_model.py for Qwen3-4B-Instruct with an extended vocabulary. Evaluation commands utilize torchrun and may require deploying separate sglang servers for reasoning evaluation.
  • Resources: Likely requires significant computational resources, including GPUs, for training and evaluation, as indicated by torchrun and nproc_per_node arguments.
  • Links: Paper: https://arxiv.org/abs/2510.20815.

Highlighted Details

  • Introduces GREAM, an end-to-end generative reasoning recommendation framework.
  • Features a novel Sparse-Regularized Group Policy Optimization (SRPO) method for stable RL fine-tuning on sparse data.
  • Utilizes a synthetic Chain-of-Thought (CoT) dataset for progressive reasoning curriculum training.
  • Aligns heterogeneous textual evidence with collaborative filtering signals for improved item indexing.

Maintenance & Community

No specific details on community channels, active contributors, or roadmap are provided in the README.

Licensing & Compatibility

The README does not explicitly state the project's license.

Limitations & Caveats

The README does not detail any specific limitations, known bugs, or alpha status of the project. The setup for reasoning evaluation requires deploying separate sglang servers.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
49 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.