Discover and explore top open-source AI tools and projects—updated daily.
microsoftText-to-video generation with reinforced 3D geometric consistency
Top 76.8% on SourcePulse
Summary
World-R1 enhances text-to-video generation by enforcing 3D geometric consistency via reinforcement learning. It targets researchers needing improved 3D understanding without altering base models or requiring extensive 3D supervision, preserving visual quality and motion diversity.
How It Works
It uses camera-aware latent initialization for motion injection. Reinforcement learning fine-tunes with 3D-aware rewards (meta-view, reconstruction, trajectory) and aesthetic rewards via Flow-GRPO post-training. Periodic dynamic-only training boosts motion diversity while retaining 3D consistency.
Quick Start & Requirements
Requires Python 3.10+, CUDA, matching PyTorch. Installation involves environment setup (conda create, conda activate), PyTorch, and core training/inference packages. Training scripts are provided; requires separate reward server launches.
Highlighted Details
Maintenance & Community
Developed by Zhejiang University and Microsoft Research. Support/security details in SUPPORT.md/SECURITY.md. No direct community links or roadmap.
Licensing & Compatibility
MIT license. Bundled third-party code (Flow-GRPO, Depth Anything 3) in licenses/ has separate upstream licenses requiring review for derivative works.
Limitations & Caveats
README lacks explicit limitations. Setup requires specific CUDA/PyTorch versions. Training involves complex multi-node setups and separate reward server processes.
3 weeks ago
Inactive
SkyworkAI