Being-H  by BeingBeyond

Vision-language-action model for robot learning

Created 6 months ago
310 stars

Top 87.1% on SourcePulse

GitHubView on GitHub
Project Summary

Being-H0.5 is a foundational Vision-Language-Action (VLA) model designed to enhance cross-embodiment generalization in robot control. It scales human-centric learning using the UniHand-2.0 dataset and a unified action space, aiming to provide robust and adaptable robotic manipulation capabilities for researchers and practitioners.

How It Works

Being-H0.5 is a foundational Vision-Language-Action (VLA) model focused on scaling human-centric robot learning. It utilizes the UniHand-2.0 dataset and a unified action space to achieve robust cross-embodiment generalization, enabling policies to adapt across diverse robotic hardware. The architecture integrates visual perception, language understanding, and action generation for interpreting instructions and environmental states for robotic control.

Quick Start & Requirements

  • Installation involves cloning the repository, creating a Conda environment with Python 3.10, and installing dependencies via requirements.txt and flash-attn.
  • GPU acceleration is required for inference (cuda:0).
  • Pretrained models are available on Hugging Face.
  • Links: Blog, Paper, Hugging Face Models.

Highlighted Details

  • Offers 2B parameter VLA models on Hugging Face: base (preview), specialist (LIBERO, RoboCasa), and generalist variants.
  • Provides Python APIs and an inference server for robot policy execution and real-time control.
  • Includes evaluation scripts for LIBERO and RoboCasa benchmarks.
  • Supports post-training on custom robot data for adaptation to specific platforms.

Maintenance & Community

  • The project encourages contributions and collaboration.
  • It builds upon significant open-source projects like InternVL, Bagel, Qwen, LIBERO, and RoboCasa.
  • No direct community links (e.g., Discord, Slack) or explicit roadmap are provided in the README.

Licensing & Compatibility

  • Licensed under Apache 2.0, which is generally permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

  • The project is under active development, with several key features marked as "TODO," including complete pretraining/post-training scripts, detailed documentation, and out-of-the-box real robot checkpoints.
  • The primary "Being-H05-2B" model is noted as a preview.
Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
3
Star History
118 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.