GRRM by Indolent-Kawhi

LLM framework for generative reasoning recommendations

Created 8 months ago

272 stars

Top 94.6% on SourcePulse

Project Summary

Generative Reasoning Recommendation via LLMs (GREAM) addresses the challenges of applying Large Language Models (LLMs) to recommendation tasks, particularly the gap between textual semantics and collaborative filtering signals, and the issue of sparse user feedback. It introduces an end-to-end framework that unifies understanding, reasoning, and prediction for recommendation systems. The project is targeted at researchers and engineers seeking to enhance recommendation accuracy and reasoning capabilities using LLMs.

How It Works

GREAM integrates three key components: Collaborative–Semantic Alignment fuses heterogeneous textual evidence (titles, descriptions, reviews) to construct semantically consistent discrete item indices, aligning linguistic and interaction semantics. Reasoning Curriculum Activation builds a synthetic Chain-of-Thought (CoT) dataset and trains via a progressive curriculum covering behavioral evidence extraction, latent preference modeling, intent inference, and recommendation formulation. Sparse-Regularized Group Policy Optimization (SRPO) is a novel reinforcement learning method combining Residual-Sensitive Verifiable Reward (RSVR) and Bonus-Calibrated Group Advantage Estimation (BGAE) for stable and verifiable fine-tuning under sparse signals.

Quick Start & Requirements

Installation: Execute bash scripts/install.sh to set up the environment.
Prerequisites: Requires LLaMA-Factory for Supervised Fine-Tuning (SFT). Data preparation involves downloading and unzipping specific files (data.zip, sft_data.zip). Model training involves scripts/construct_model.py for Qwen3-4B-Instruct with an extended vocabulary. Evaluation commands utilize torchrun and may require deploying separate sglang servers for reasoning evaluation.
Resources: Likely requires significant computational resources, including GPUs, for training and evaluation, as indicated by torchrun and nproc_per_node arguments.
Links: Paper: https://arxiv.org/abs/2510.20815.

Highlighted Details

Introduces GREAM, an end-to-end generative reasoning recommendation framework.
Features a novel Sparse-Regularized Group Policy Optimization (SRPO) method for stable RL fine-tuning on sparse data.
Utilizes a synthetic Chain-of-Thought (CoT) dataset for progressive reasoning curriculum training.
Aligns heterogeneous textual evidence with collaborative filtering signals for improved item indexing.

Maintenance & Community

No specific details on community channels, active contributors, or roadmap are provided in the README.

Licensing & Compatibility

The README does not explicitly state the project's license.

Limitations & Caveats

The README does not detail any specific limitations, known bugs, or alpha status of the project. The setup for reasoning evaluation requires deploying separate sglang servers.

GRRM by Indolent-Kawhi

Explore Similar Projects

CoT-Collection by kaistAI

Raspberry by daveshap

ReasonFlux by Gen-Verse

dynamic-cheatsheet by suzgunmirac

Awesome-Efficient-Reasoning-LLMs by Eclipsess

POLARIS by ChenxinAn-fdu

reasoning-on-graphs by RManLuo

M_GRPO by baibizhe

rStar by zhentingqi

Mulberry by HJYao00

train-deepseek-r1 by FareedKhan-dev

coconut by facebookresearch