RynnBrain by alibaba-damo-academy

Embodied foundation models for physical reality

Created 2 weeks ago

New!

489 stars

Top 63.1% on SourcePulse

Project Summary

Summary

RynnBrain introduces open embodied foundation models grounded in physical reality, designed for researchers and engineers in robotics and AI. It offers comprehensive egocentric understanding, spatio-temporal localization, and physics-aware planning, enabling more robust and nuanced interaction with the physical world.

How It Works

The project utilizes a unified encoder-decoder architecture, available in Dense (2B, 8B) and Mixture-of-Experts (30B-A3B) variants. It processes omni-vision and textual inputs to generate multi-modal outputs, including spatial trajectories and action plans. Training on extensive spatio-temporal, physical-space, and general knowledge data underpins its capabilities. Key innovations include an interleaved reasoning strategy that grounds textual understanding in physical space and physics-aware planning that integrates object affordances for complex task execution.

Quick Start & Requirements

Installation is straightforward via pip install transformers==4.57.1. Minimal shell dependencies are required. Comprehensive examples and use-case demonstrations are available in the provided cookbooks.

Highlighted Details

Comprehensive egocentric understanding: Excels in fine-grained video understanding and egocentric cognition, covering embodied QA, counting, and OCR.
Diverse spatio-temporal localization: Possesses powerful localization capabilities across episodic memory, enabling precise identification of objects, target areas, and motion trajectories.
Physical-space reasoning: Employs an interleaved reasoning strategy that alternates between textual and spatial grounding, ensuring reasoning is rooted in the physical environment.
Physics-aware precise planning: Integrates located affordances and object information into planning, enabling downstream VLA models to execute intricate tasks with fine-grained instructions.
RynnBrain-Bench: A high-dimensional benchmark for embodied understanding evaluating object cognition, spatial cognition, grounding, and pointing.

Maintenance & Community

The project is part of an active ecosystem with several related GitHub repositories and arXiv papers, including RynnEC, RynnScale, RynnVLA-001, RynnVLA-002, RynnRCP, and RynnMotion. This indicates ongoing development and research by the Alibaba Damo Academy team. No direct community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

RynnBrain by alibaba-damo-academy

Explore Similar Projects

Embodied-AI-Paper-TopConf by Songwxuan

Awesome-VLA-Papers by Psi-Robot

EmbodiedCity by tsinghua-fib-lab

Instruct2Act by OpenGVLab

vla0 by NVlabs

Awesome-Embodied-AI by haoranD

RoboBrain by FlagOpen

Motus by thu-ml

molmoact by allenai

Awesome-Embodied-AI by yunlongdong

awesome-embodied-vla-va-vln by jonyzhang2023

Magma by microsoft