Being-H by BeingBeyond

Vision-language-action model for robot learning

Created 7 months ago

351 stars

Top 79.7% on SourcePulse

Project Summary

Being-H0.5 is a foundational Vision-Language-Action (VLA) model designed to enhance cross-embodiment generalization in robot control. It scales human-centric learning using the UniHand-2.0 dataset and a unified action space, aiming to provide robust and adaptable robotic manipulation capabilities for researchers and practitioners.

How It Works

Being-H0.5 is a foundational Vision-Language-Action (VLA) model focused on scaling human-centric robot learning. It utilizes the UniHand-2.0 dataset and a unified action space to achieve robust cross-embodiment generalization, enabling policies to adapt across diverse robotic hardware. The architecture integrates visual perception, language understanding, and action generation for interpreting instructions and environmental states for robotic control.

Quick Start & Requirements

Installation involves cloning the repository, creating a Conda environment with Python 3.10, and installing dependencies via requirements.txt and flash-attn.
GPU acceleration is required for inference (cuda:0).
Pretrained models are available on Hugging Face.
Links: Blog, Paper, Hugging Face Models.

Highlighted Details

Offers 2B parameter VLA models on Hugging Face: base (preview), specialist (LIBERO, RoboCasa), and generalist variants.
Provides Python APIs and an inference server for robot policy execution and real-time control.
Includes evaluation scripts for LIBERO and RoboCasa benchmarks.
Supports post-training on custom robot data for adaptation to specific platforms.

Maintenance & Community

The project encourages contributions and collaboration.
It builds upon significant open-source projects like InternVL, Bagel, Qwen, LIBERO, and RoboCasa.
No direct community links (e.g., Discord, Slack) or explicit roadmap are provided in the README.

Licensing & Compatibility

Licensed under Apache 2.0, which is generally permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is under active development, with several key features marked as "TODO," including complete pretraining/post-training scripts, detailed documentation, and out-of-the-box real robot checkpoints.
The primary "Being-H05-2B" model is noted as a preview.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

5

Star History

13 stars in the last 30 days

Explore Similar Projects

Large-VLM-based-VLA-for-Robotic-Manipulation by JiuTian-VL

Advancing robotic manipulation with large Vision-Language-Action models

Created 9 months ago

Updated 2 months ago

Hybrid-VLA by PKU-HMI-Lab

Unified vision-language-action model

Created 1 year ago

Updated 5 months ago

OCRM_survey by RayYoh

A survey for embodied learning in object-centric robotic manipulation

Created 1 year ago

Updated 1 year ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

vla0 by NVlabs

State-of-the-art Vision-Language-Action models via text-based action representation

Created 5 months ago

Updated 2 weeks ago

RoboFlamingo by RoboFlamingo

Robotics learning framework for language-conditioned robot skills via fine-tuning

Created 2 years ago

Updated 1 year ago

Starred by

Phil Wang

Phil Wang(Prolific Research Paper Implementer).

GR00T-Dreams by NVIDIA

Synthetic data generation for robot learning

Created 9 months ago

Updated 4 months ago

Starred by

Phil Wang

Phil Wang(Prolific Research Paper Implementer).

lingbot-vla by Robbyant

Pragmatic Vision-Language-Action model for robotics

Created 1 month ago

Updated 2 days ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

opendr by opendr-eu

Open deep-learning toolkit for robotics applications

Created 5 years ago

Updated 2 weeks ago

Starred by

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs).

Awesome-Robotics-Foundation-Models by robotics-survey

Robotics survey paper resources

Created 2 years ago

Updated 1 year ago

3D-Diffusion-Policy by YanjieZe

Generalizable visuomotor policy learning with 3D representations

Created 2 years ago

Updated 4 months ago

Starred by

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI) and

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

octo by octo-models

Robot policy for generalist manipulation, trained on 800k trajectories

Created 2 years ago

Updated 1 year ago

Starred by

Brian Ichter

Brian Ichter(Cofounder of Physical Intelligence),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

6 more.

openpi by Physical-Intelligence

Robotics vision-language-action models

Created 1 year ago

Updated 1 day ago

Feedback? Help us improve.