RLVR-World by thuml

Training world models with Reinforcement Learning from Verifiable Rewards

Created 1 year ago

267 stars

Top 95.7% on SourcePulse

Project Summary

RLVR-World pioneers training world models across language and video modalities by unifying them under sequence modeling. It targets researchers and engineers, offering improved model performance via reinforcement learning optimized against task-specific prediction metrics.

How It Works

The framework employs RLVR, treating world models as sequence modeling problems. Task-specific prediction metrics serve as direct rewards for reinforcement learning optimization. This approach aligns learned dynamics with downstream objectives, potentially yielding more effective and generalizable world representations.

Quick Start & Requirements

The repository offers released models, datasets, and training codes. However, specific installation instructions, detailed prerequisites (Python version, libraries, hardware), or setup time estimates are absent from the README. Users may need to consult cited repositories or contact authors.

Highlighted Details

Supports Language (text games, web navigation) and Video (robot manipulation) modalities.
Provides pre-trained world models and datasets, including video tokenizers.
Applications include text game state prediction, web agent control, and robot manipulation trajectory prediction.
Features models trained via supervised fine-tuning (SFT) and RLVR for comparative analysis.

Maintenance & Community

Associated with NeurIPS 2025, the project provides contact (wujialong0229@gmail.com) and acknowledges several GitHub repositories. No explicit community channels or roadmap are mentioned.

Licensing & Compatibility

The README lacks explicit licensing information. This omission requires further investigation for usage rights, especially for commercial applications.

Limitations & Caveats

The README focuses on contributions, not limitations. As a NeurIPS 2025 publication, the codebase is research-oriented and may require significant effort for production deployment, compounded by the lack of detailed setup instructions.

RLVR-World by thuml

Explore Similar Projects

awesome-in-context-rl by dunnolab

GUI-R1 by ritzz-ai

Awesome-WAM by OpenMOSS

awesome-vla-wam by DravenALG

vla0 by NVlabs

Being-H by BeingBeyond

RynnVLA-002 by alibaba-damo-academy

Robo-Dopamine by FlagOpen

Agent-R1 by AgentR1

pfrl by pfnet

flow_grpo by yifan123

RL4LMs by allenai