Discover and explore top open-source AI tools and projects—updated daily.
EmbodiedGPTPyTorch dataset tooling for multimodal embodied AI
Top 80.7% on SourcePulse
This repository provides the codebase for EmbodiedGPT, a vision-language pre-training model that leverages embodied chain-of-thought reasoning. It is designed for researchers and engineers working on multimodal AI, offering a flexible framework for training on diverse datasets including images, videos, and text.
How It Works
The core of the library is built around PyTorch's Dataset and DataLoader. It introduces BaseDataset for handling heterogeneous media types (images, videos, text) with standardized transformations and task-specific processing. The WeightedConcatDataset allows for combining multiple datasets with adjustable weights, enabling balanced training across different data sources and tasks. This modular design facilitates customization and integration into existing PyTorch training pipelines.
Quick Start & Requirements
INSTALLATION.md.datasets_share.zip to ./datasets/.Embodied_family_7btiny.INSTALLATION.md.Highlighted Details
Dataset and DataLoader.WeightedConcatDataset allows for weighted combination of multiple datasets.BaseDataset.Maintenance & Community
The project is associated with the paper "EmbodiedGPT: Vision-language pre-training via embodied chain of thought" published in NeurIPS 2024.
Licensing & Compatibility
Limitations & Caveats
The README indicates that instructions will be updated soon, suggesting the documentation might be incomplete or subject to change.
1 year ago
Inactive
allenai
lucidrains