GR-1 by bytedance

GPT-style model for visual robot manipulation research

Created 1 year ago

290 stars

Top 90.9% on SourcePulse

View on GitHub

2 Experts Love This Project

Alberto Taiuti

Cofounder of Luma AI

Alex Yu

Research Scientist at OpenAI; Cofounder of Luma AI

Project Summary

GR-1 provides a GPT-style transformer model for language-conditioned visual robot manipulation, enabling end-to-end prediction of robot actions and future visual states. It is designed for researchers and engineers working on robotic control, offering significant improvements in task success rates and zero-shot generalization capabilities on benchmarks and real robots.

How It Works

GR-1 adopts a GPT-style architecture, processing language instructions, sequential observation images, and robot states to predict actions and future frames. This approach leverages large-scale video generative pre-training to learn robust representations, allowing for seamless fine-tuning on specific robot datasets and demonstrating strong generalization across diverse tasks and environments.

Quick Start & Requirements

Install by following the CALVIN repo's installation steps and installing additional dependencies via install.sh.
Requires the CALVIN dataset and MAE ViT-Base pre-trained weights.
Evaluation involves downloading GR-1 weights, placing them in the logs/ directory, and running evaluate_calvin.sh with specified paths for CALVIN data and MAE checkpoints.
Official CALVIN repo: https://github.com/calvin-robot/calvin
MAE repo: https://github.com/facebookresearch/mae

Highlighted Details

Improves CALVIN benchmark success rate from 88.9% to 94.9%.
Enhances zero-shot unseen scene generalization success rate from 53.3% to 85.4%.
Demonstrates strong performance on real robot experiments and generalization to unseen objects.
Offers a unified transformer architecture for multi-task visual robot manipulation.

Maintenance & Community

The project is associated with ByteDance and lists several authors with contributions indicated by asterisks. Further community or maintenance details are not explicitly provided in the README.

Licensing & Compatibility

Licensed under the Apache-2.0 License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The project relies heavily on the CALVIN benchmark and its associated environment setup, which may present a barrier to entry. Pre-trained weights for GR-1 are provided for specific data splits (ABCD-D and ABC-D), and users must ensure compatibility with their intended evaluation or fine-tuning scenarios.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days