RynnBrain  by alibaba-damo-academy

Embodied foundation models for physical reality

Created 2 weeks ago

New!

489 stars

Top 63.1% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

RynnBrain introduces open embodied foundation models grounded in physical reality, designed for researchers and engineers in robotics and AI. It offers comprehensive egocentric understanding, spatio-temporal localization, and physics-aware planning, enabling more robust and nuanced interaction with the physical world.

How It Works

The project utilizes a unified encoder-decoder architecture, available in Dense (2B, 8B) and Mixture-of-Experts (30B-A3B) variants. It processes omni-vision and textual inputs to generate multi-modal outputs, including spatial trajectories and action plans. Training on extensive spatio-temporal, physical-space, and general knowledge data underpins its capabilities. Key innovations include an interleaved reasoning strategy that grounds textual understanding in physical space and physics-aware planning that integrates object affordances for complex task execution.

Quick Start & Requirements

Installation is straightforward via pip install transformers==4.57.1. Minimal shell dependencies are required. Comprehensive examples and use-case demonstrations are available in the provided cookbooks.

Highlighted Details

  • Comprehensive egocentric understanding: Excels in fine-grained video understanding and egocentric cognition, covering embodied QA, counting, and OCR.
  • Diverse spatio-temporal localization: Possesses powerful localization capabilities across episodic memory, enabling precise identification of objects, target areas, and motion trajectories.
  • Physical-space reasoning: Employs an interleaved reasoning strategy that alternates between textual and spatial grounding, ensuring reasoning is rooted in the physical environment.
  • Physics-aware precise planning: Integrates located affordances and object information into planning, enabling downstream VLA models to execute intricate tasks with fine-grained instructions.
  • RynnBrain-Bench: A high-dimensional benchmark for embodied understanding evaluating object cognition, spatial cognition, grounding, and pointing.

Maintenance & Community

The project is part of an active ecosystem with several related GitHub repositories and arXiv papers, including RynnEC, RynnScale, RynnVLA-001, RynnVLA-002, RynnRCP, and RynnMotion. This indicates ongoing development and research by the Alibaba Damo Academy team. No direct community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The project is licensed under the Apache License 2.0, which is permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The provided README does not explicitly detail any limitations, known bugs, or alpha/beta status of the RynnBrain models.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
5
Star History
495 stars in the last 17 days

Explore Similar Projects

Feedback? Help us improve.