RoboBrain by FlagOpen

Unified brain model for robotic manipulation

Created 8 months ago

344 stars

Top 80.4% on SourcePulse

Project Summary

RoboBrain is a unified brain model for robotic manipulation, addressing limitations in current multimodal large language models (MLLMs) for long-horizon tasks. It targets researchers and developers in robotics and AI, enabling robots to perform complex manipulation by enhancing planning, affordance perception, and trajectory prediction capabilities.

How It Works

RoboBrain builds upon MLLMs by incorporating a high-quality, heterogeneous dataset called ShareRobot, which includes multi-dimensional annotations for task planning, object affordance, and end-effector trajectory. It employs a multi-stage training strategy, leveraging long videos and high-resolution images to improve robotic manipulation performance. The model is modular, with separate checkpoints for planning, affordance (A-LoRA), and trajectory (T-LoRA) prediction, allowing for flexible integration and fine-tuning.

Quick Start & Requirements

Install: Clone the repository and set up a Conda environment (conda create -n robobrain python=3.10, conda activate robobrain, pip install -r requirements.txt).
Prerequisites: Python 3.10, Conda. Specific hardware requirements for training are not detailed but are implied to be substantial.
Inference: Supports Hugging Face (HF) and VLLM inference. VLLM requires pip install vllm==0.6.6.post1.
Resources: Links to Hugging Face models and datasets are provided.

Highlighted Details

CVPR 2025 Publication: The project is associated with a CVPR 2025 paper.
Modular Checkpoints: Offers separate LoRA checkpoints for Planning, Affordance (A-LoRA), and Trajectory (T-LoRA) prediction.
RoboBrain 2.0: A more powerful version with 7B and upcoming 32B parameter models is available.
Inference Options: Supports direct HF inference, VLLM for high throughput, and FlagScale for distributed deployment.

Maintenance & Community

The project is actively developed, with recent updates including RoboBrain 2.0 releases and checkpoint availability. It acknowledges contributions from projects like LLaVA-NeXT, lmms-eval, and vLLM. Links to Hugging Face, ModelScope, and the paper are provided.

Licensing & Compatibility

The repository does not explicitly state a license. However, the project is presented as open-source, with models available on Hugging Face and ModelScope. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify licensing details, which could impact commercial adoption. Detailed hardware requirements for training are not provided, suggesting potentially high resource needs. The project is presented as a research artifact, and stability for production environments is not guaranteed.

RoboBrain by FlagOpen

Explore Similar Projects

Large-VLM-based-VLA-for-Robotic-Manipulation by JiuTian-VL

vla0 by NVlabs

Hybrid-VLA by PKU-HMI-Lab

OpenHelix by OpenHelix-Team

X-VLA by 2toinf

molmoact by allenai

embodied-agents by mbodiai

CogACT by microsoft

Awesome-Embodied-AI by yunlongdong

LLM-in-Vision by DirtyHarryLYL

octo by octo-models

Isaac-GR00T by NVIDIA