Discover and explore top open-source AI tools and projects—updated daily.
AkaliKongOpen-source framework for scalable generative recommendation
Top 42.3% on SourcePulse
Summary
MiniOneRec addresses the challenge of scaling generative recommendation systems by providing an open-source, end-to-end framework. It covers the complete workflow from constructing compact Semantic Item IDs (SIDs) and supervised fine-tuning (SFT) to recommendation-oriented reinforcement learning (RL). This enables high-performance, lightweight recommendation by integrating linguistic knowledge from large language models (LLMs) with discrete item codes, targeting researchers and engineers in the field.
How It Works
The framework first transforms items into Semantic Item IDs (SIDs) by encoding titles and descriptions via a frozen text encoder and quantizing embeddings with a three-level RQ-VAE. This process creates compact, discrete item representations. Next, SFT trains the model on user history as token sequences, predicting the next SID, and leverages language-alignment objectives to imbue LLM world knowledge into the SID space. Finally, recommendation-oriented RL, based on GRPO, refines the policy. It employs normalized rewards, KL penalties, and constrained beam search for diverse, valid recommendations, with a reward signal combining correctness and rank-awareness.
Quick Start & Requirements
Reproduction requires 4-8 A100/H100 GPUs. The primary installation involves creating a Python 3.11 conda environment, activating it, and running pip install -r requirements.txt. The framework offers pre-trained checkpoints and an end-to-end workflow including data preparation scripts for Amazon datasets, SID construction (RQ-VAE, RQ-Kmeans), SFT, RL, and offline evaluation. Links to a technical report, datasets, and checkpoints are mentioned but not directly provided as URLs. A repository overview link is available.
Highlighted Details
Maintenance & Community
Developed by LDS AlphaLab NExT, the project welcomes community contributions. Upcoming features include expanded SID construction algorithms (R-VQ, RQ-Kmeans, RQ-OPQ, RQ-VAE-v2), a "MiniOneRec-Think" module for dialogue and reasoning, and broader dataset support. No specific community channels (e.g., Discord, Slack) are listed.
Licensing & Compatibility
The project is licensed under the Apache-2.0 license, which is generally permissive for commercial use and integration into closed-source projects.
Limitations & Caveats
The RQ-Kmeans method, based on semantic embeddings, is noted to have a relatively high collision rate. The RL stage is presented as optional for production-scale datasets due to potential cost and diminishing marginal returns, suggesting careful consideration for its application on very large datasets.
1 week ago
Inactive
AkariAsai