Discover and explore top open-source AI tools and projects—updated daily.
AIGeeksGroup3D scene understanding model
Top 79.6% on SourcePulse
3D-R1 is a foundation model designed to enhance the reasoning capabilities of 3D Vision-Language Models (VLMs) for unified scene understanding. It addresses the limitations of current 3D VLMs in robust reasoning and generalization, offering a solution for researchers and practitioners working with 3D spatial data.
How It Works
3D-R1 employs a multi-faceted approach to improve 3D scene understanding. It utilizes a novel synthetic dataset, Scene-30K, created with Chain-of-Thought (CoT) reasoning and leveraging Gemini 2.5 Pro. For enhanced reasoning, it incorporates Reinforcement Learning from Human Feedback (RLHF) techniques, specifically GRPO, guided by three reward functions: perception, semantic similarity, and format rewards. A dynamic view selection strategy adaptively chooses the most informative perspectives, further boosting performance.
Quick Start & Requirements
h5py, scipy, cython, plyfile, trimesh, networkx, torch (2.0.1+cu118), google-generative-ai, peft, transformers, accelerate, tqdm, orjson, and specific git installations for CLIP and Depth-Anything. PointNet++ and accelerated GIOU need to be built from source.Highlighted Details
Maintenance & Community
The project is associated with Ting Huang, Zeyu Zhang, and Hao Tang. Further community engagement details (like Discord/Slack) are not specified in the README.
Licensing & Compatibility
The README does not explicitly state the license type or compatibility for commercial use.
Limitations & Caveats
The project acknowledges a bounding box drift issue in visualizations, which is currently being addressed. A detailed visualization tutorial and a Hugging Face demo are planned but not yet released.
4 days ago
Inactive
NVlabs