PyTorch dataset tooling for multimodal embodied AI
Top 82.4% on sourcepulse
This repository provides the codebase for EmbodiedGPT, a vision-language pre-training model that leverages embodied chain-of-thought reasoning. It is designed for researchers and engineers working on multimodal AI, offering a flexible framework for training on diverse datasets including images, videos, and text.
How It Works
The core of the library is built around PyTorch's Dataset
and DataLoader
. It introduces BaseDataset
for handling heterogeneous media types (images, videos, text) with standardized transformations and task-specific processing. The WeightedConcatDataset
allows for combining multiple datasets with adjustable weights, enabling balanced training across different data sources and tasks. This modular design facilitates customization and integration into existing PyTorch training pipelines.
Quick Start & Requirements
INSTALLATION.md
.datasets_share.zip
to ./datasets/
.Embodied_family_7btiny
.INSTALLATION.md
.Highlighted Details
Dataset
and DataLoader
.WeightedConcatDataset
allows for weighted combination of multiple datasets.BaseDataset
.Maintenance & Community
The project is associated with the paper "EmbodiedGPT: Vision-language pre-training via embodied chain of thought" published in NeurIPS 2024.
Licensing & Compatibility
Limitations & Caveats
The README indicates that instructions will be updated soon, suggesting the documentation might be incomplete or subject to change.
1 year ago
1+ week