EmbodiedGPT_Pytorch by EmbodiedGPT

PyTorch dataset tooling for multimodal embodied AI

Created 2 years ago

345 stars

Top 80.3% on SourcePulse

Project Summary

This repository provides the codebase for EmbodiedGPT, a vision-language pre-training model that leverages embodied chain-of-thought reasoning. It is designed for researchers and engineers working on multimodal AI, offering a flexible framework for training on diverse datasets including images, videos, and text.

How It Works

The core of the library is built around PyTorch's Dataset and DataLoader. It introduces BaseDataset for handling heterogeneous media types (images, videos, text) with standardized transformations and task-specific processing. The WeightedConcatDataset allows for combining multiple datasets with adjustable weights, enabling balanced training across different data sources and tasks. This modular design facilitates customization and integration into existing PyTorch training pipelines.

Quick Start & Requirements

Installation: Follow instructions in INSTALLATION.md.
Data Preparation: Download EgoCOT and COCO-2017 datasets. Unzip datasets_share.zip to ./datasets/.
Pretrained Model: Download Embodied_family_7btiny.
Prerequisites: PyTorch, Python. Specific versions and hardware requirements are detailed in INSTALLATION.md.

Highlighted Details

Enables training on heterogeneous data (images, videos, text) using PyTorch's Dataset and DataLoader.
WeightedConcatDataset allows for weighted combination of multiple datasets.
Designed for flexibility and customization through subclassing BaseDataset.

Maintenance & Community

The project is associated with the paper "EmbodiedGPT: Vision-language pre-training via embodied chain of thought" published in NeurIPS 2024.

Licensing & Compatibility

License: Apache 2.0.
Compatibility: Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The README indicates that instructions will be updated soon, suggesting the documentation might be incomplete or subject to change.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

CM3Leon by kyegomez

Open-source implementation of a multimodal AI research paper

Created 2 years ago

Updated 2 years ago

Starred by

Jiayi Pan

Jiayi Pan(Author of SWE-Gym; MTS at xAI),

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and

3 more.

unified-io-2 by allenai

Unified-IO 2 code for training, inference, and demo

Created 2 years ago

Updated 1 year ago

Awesome_Matching_Pretraining_Transfering by Paranioar

Curated paper list for multimodal AI research

Created 5 years ago

Updated 3 months ago

awesome-tensorlayer by tensorlayer

Deep learning library for research and industry

Created 7 years ago

Updated 6 years ago

PandaGPT by yxuansu

Multimodal model for instruction following across six modalities

Created 2 years ago

Updated 2 years ago

Starred by

Phil Wang

Phil Wang(Prolific Research Paper Implementer) and

Ross Wightman

Ross Wightman(Author of timm; CV at Hugging Face).

transfusion-pytorch by lucidrains

Pytorch implementation for multimodal model research

Created 1 year ago

Updated 3 days ago

Starred by

Chenlin Meng

Chenlin Meng(Cofounder of Pika) and

Robin Rombach

Robin Rombach(Cofounder of Black Forest Labs).

glow-pytorch by chaiyujin

PyTorch implementation of OpenAI's Glow generative flow paper

Created 7 years ago

Updated 6 years ago

Starred by

Jesse Clark

Jesse Clark(Cofounder of Marqo).

train-CLIP by Zasder3

PyTorch Lightning module for CLIP model training and fine-tuning

Created 4 years ago

Updated 3 years ago

SkinDeep by vijishmadhavan

AI model for tattoo removal from images

Created 4 years ago

Updated 2 years ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

4 more.

multimodal by facebookresearch

PyTorch library for multimodal multi-task model training

Created 4 years ago

Updated 6 days ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic),

Eric Zhu

Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), and

1 more.

pixeltable by pixeltable

AI data infrastructure for multimodal apps using declarative, incremental approach

Created 2 years ago

Updated 20 hours ago

Starred by

Alexander Wu

Alexander Wu(Founder of MetaGPT).

AutoDL by DeepWisdom

Automated ML framework for multimodal multi-label classification

Created 5 years ago

Updated 3 years ago

Feedback? Help us improve.