RLVR-World  by thuml

Training world models with Reinforcement Learning from Verifiable Rewards

Created 1 year ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
Project Summary

RLVR-World pioneers training world models across language and video modalities by unifying them under sequence modeling. It targets researchers and engineers, offering improved model performance via reinforcement learning optimized against task-specific prediction metrics.

How It Works

The framework employs RLVR, treating world models as sequence modeling problems. Task-specific prediction metrics serve as direct rewards for reinforcement learning optimization. This approach aligns learned dynamics with downstream objectives, potentially yielding more effective and generalizable world representations.

Quick Start & Requirements

The repository offers released models, datasets, and training codes. However, specific installation instructions, detailed prerequisites (Python version, libraries, hardware), or setup time estimates are absent from the README. Users may need to consult cited repositories or contact authors.

Highlighted Details

  • Supports Language (text games, web navigation) and Video (robot manipulation) modalities.
  • Provides pre-trained world models and datasets, including video tokenizers.
  • Applications include text game state prediction, web agent control, and robot manipulation trajectory prediction.
  • Features models trained via supervised fine-tuning (SFT) and RLVR for comparative analysis.

Maintenance & Community

Associated with NeurIPS 2025, the project provides contact (wujialong0229@gmail.com) and acknowledges several GitHub repositories. No explicit community channels or roadmap are mentioned.

Licensing & Compatibility

The README lacks explicit licensing information. This omission requires further investigation for usage rights, especially for commercial applications.

Limitations & Caveats

The README focuses on contributions, not limitations. As a NeurIPS 2025 publication, the codebase is research-oriented and may require significant effort for production deployment, compounded by the lack of detailed setup instructions.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

RL4LMs by allenai

0%
2k
RL library to fine-tune language models to human preferences
Created 3 years ago
Updated 2 years ago
Feedback? Help us improve.