RoboBrain  by FlagOpen

Unified brain model for robotic manipulation

created 4 months ago
290 stars

Top 90.7% on SourcePulse

GitHubView on GitHub
Project Summary

RoboBrain is a unified brain model for robotic manipulation, addressing limitations in current multimodal large language models (MLLMs) for long-horizon tasks. It targets researchers and developers in robotics and AI, enabling robots to perform complex manipulation by enhancing planning, affordance perception, and trajectory prediction capabilities.

How It Works

RoboBrain builds upon MLLMs by incorporating a high-quality, heterogeneous dataset called ShareRobot, which includes multi-dimensional annotations for task planning, object affordance, and end-effector trajectory. It employs a multi-stage training strategy, leveraging long videos and high-resolution images to improve robotic manipulation performance. The model is modular, with separate checkpoints for planning, affordance (A-LoRA), and trajectory (T-LoRA) prediction, allowing for flexible integration and fine-tuning.

Quick Start & Requirements

  • Install: Clone the repository and set up a Conda environment (conda create -n robobrain python=3.10, conda activate robobrain, pip install -r requirements.txt).
  • Prerequisites: Python 3.10, Conda. Specific hardware requirements for training are not detailed but are implied to be substantial.
  • Inference: Supports Hugging Face (HF) and VLLM inference. VLLM requires pip install vllm==0.6.6.post1.
  • Resources: Links to Hugging Face models and datasets are provided.

Highlighted Details

  • CVPR 2025 Publication: The project is associated with a CVPR 2025 paper.
  • Modular Checkpoints: Offers separate LoRA checkpoints for Planning, Affordance (A-LoRA), and Trajectory (T-LoRA) prediction.
  • RoboBrain 2.0: A more powerful version with 7B and upcoming 32B parameter models is available.
  • Inference Options: Supports direct HF inference, VLLM for high throughput, and FlagScale for distributed deployment.

Maintenance & Community

The project is actively developed, with recent updates including RoboBrain 2.0 releases and checkpoint availability. It acknowledges contributions from projects like LLaVA-NeXT, lmms-eval, and vLLM. Links to Hugging Face, ModelScope, and the paper are provided.

Licensing & Compatibility

The repository does not explicitly state a license. However, the project is presented as open-source, with models available on Hugging Face and ModelScope. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify licensing details, which could impact commercial adoption. Detailed hardware requirements for training are not provided, suggesting potentially high resource needs. The project is presented as a research artifact, and stability for production environments is not guaranteed.

Health Check
Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
29 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI) and Thomas Wolf Thomas Wolf(Cofounder of Hugging Face).

openpi by Physical-Intelligence

1.5%
4k
Robotics vision-language-action models
created 9 months ago
updated 1 day ago
Feedback? Help us improve.