GPT-style model for visual robot manipulation research
Top 95.0% on SourcePulse
GR-1 provides a GPT-style transformer model for language-conditioned visual robot manipulation, enabling end-to-end prediction of robot actions and future visual states. It is designed for researchers and engineers working on robotic control, offering significant improvements in task success rates and zero-shot generalization capabilities on benchmarks and real robots.
How It Works
GR-1 adopts a GPT-style architecture, processing language instructions, sequential observation images, and robot states to predict actions and future frames. This approach leverages large-scale video generative pre-training to learn robust representations, allowing for seamless fine-tuning on specific robot datasets and demonstrating strong generalization across diverse tasks and environments.
Quick Start & Requirements
install.sh
.logs/
directory, and running evaluate_calvin.sh
with specified paths for CALVIN data and MAE checkpoints.Highlighted Details
Maintenance & Community
The project is associated with ByteDance and lists several authors with contributions indicated by asterisks. Further community or maintenance details are not explicitly provided in the README.
Licensing & Compatibility
Licensed under the Apache-2.0 License, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
The project relies heavily on the CALVIN benchmark and its associated environment setup, which may present a barrier to entry. Pre-trained weights for GR-1 are provided for specific data splits (ABCD-D and ABC-D), and users must ensure compatibility with their intended evaluation or fine-tuning scenarios.
1 year ago
Inactive