GR-1  by bytedance

GPT-style model for visual robot manipulation research

created 1 year ago
271 stars

Top 95.0% on SourcePulse

GitHubView on GitHub
Project Summary

GR-1 provides a GPT-style transformer model for language-conditioned visual robot manipulation, enabling end-to-end prediction of robot actions and future visual states. It is designed for researchers and engineers working on robotic control, offering significant improvements in task success rates and zero-shot generalization capabilities on benchmarks and real robots.

How It Works

GR-1 adopts a GPT-style architecture, processing language instructions, sequential observation images, and robot states to predict actions and future frames. This approach leverages large-scale video generative pre-training to learn robust representations, allowing for seamless fine-tuning on specific robot datasets and demonstrating strong generalization across diverse tasks and environments.

Quick Start & Requirements

  • Install by following the CALVIN repo's installation steps and installing additional dependencies via install.sh.
  • Requires the CALVIN dataset and MAE ViT-Base pre-trained weights.
  • Evaluation involves downloading GR-1 weights, placing them in the logs/ directory, and running evaluate_calvin.sh with specified paths for CALVIN data and MAE checkpoints.
  • Official CALVIN repo: https://github.com/calvin-robot/calvin
  • MAE repo: https://github.com/facebookresearch/mae

Highlighted Details

  • Improves CALVIN benchmark success rate from 88.9% to 94.9%.
  • Enhances zero-shot unseen scene generalization success rate from 53.3% to 85.4%.
  • Demonstrates strong performance on real robot experiments and generalization to unseen objects.
  • Offers a unified transformer architecture for multi-task visual robot manipulation.

Maintenance & Community

The project is associated with ByteDance and lists several authors with contributions indicated by asterisks. Further community or maintenance details are not explicitly provided in the README.

Licensing & Compatibility

Licensed under the Apache-2.0 License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The project relies heavily on the CALVIN benchmark and its associated environment setup, which may present a barrier to entry. Pre-trained weights for GR-1 are provided for specific data splits (ABCD-D and ABC-D), and users must ensure compatibility with their intended evaluation or fine-tuning scenarios.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI) and Thomas Wolf Thomas Wolf(Cofounder of Hugging Face).

openpi by Physical-Intelligence

1.6%
4k
Robotics vision-language-action models
created 9 months ago
updated 1 day ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind) and Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI).

Isaac-GR00T by NVIDIA

1.7%
5k
Open foundation model for humanoid robot reasoning and skills
created 5 months ago
updated 3 days ago
Feedback? Help us improve.