unified-io-2 by allenai

Unified-IO 2 code for training, inference, and demo

Created 2 years ago

641 stars

Top 51.9% on SourcePulse

View on GitHub

5 Experts Love This Project

Jiayi Pan

Author of SWE-Gym; MTS at xAI

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Jianwei Yang

Research Scientist at Meta Superintelligence Lab

Nathan Lambert

Research Scientist at AI2

and 1 more!

Project Summary

Unified-IO 2 is a multimodal foundation model designed for researchers and practitioners working with diverse data types including vision, language, audio, and action. It offers a unified framework for training and inference across these modalities, building upon the T5X codebase and providing pre-trained checkpoints for various model sizes.

How It Works

Unified-IO 2 employs an autoregressive approach to handle multiple modalities within a single model. It leverages a sophisticated preprocessing pipeline that includes task-specific steps (resizing images, converting audio to mel-spectrograms), modality-general preprocessing (tokenization, handling missing modalities), and a feature converter to ensure consistent, fixed-size tensor outputs suitable for JAX. This unified data processing and model architecture allows for efficient cross-modal learning and generation.

Quick Start & Requirements

Install: pip install -e . (for GPU/CPU) or pip install -e '.[tpu]' (for TPU). Additional dependencies for demo: pip install -e '.[demo]'.
Prerequisites: Python 3.8+ recommended (Python 3.9 may have orbax.checkpoint conflicts). CUDA for GPU. LLaMa tokenizer model file required.
Checkpoints: Available on S3 (e.g., s3://ai2-prior-uio/public/uio2-checkpoints/large-3m). Download command example: aws s3 --no-sign-request cp --recursive s3://ai2-prior-uio/public/uio2-checkpoints/large-3m large-3m --exclude "state*".
Demo: Run jupyter notebook demo.ipynb after installing demo dependencies.
Docs: https://github.com/allenai/unified-io-2

Highlighted Details

Supports vision, language, audio, and action modalities.
Codebase is modified from T5X, enabling TPU training.
Offers checkpoints in T5X format for XXL, XL, and Large model sizes.
Includes data preprocessing scripts and visualization tools.
Training and evaluation scripts are provided, configurable via gin files.

Maintenance & Community

The project is from Allen Institute for AI (AI2). Specific community channels (Discord/Slack) or roadmap details are not explicitly mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. Given the affiliation with AI2 and the use of T5X, it is likely to be permissive, but users should verify. Compatibility with commercial or closed-source projects would require license confirmation.

Limitations & Caveats

GPU/CPU setups are noted as not well-tested, with a primary focus on TPUs. Python 3.9 may encounter compatibility issues with orbax.checkpoint and JAX. Some datasets require manual preprocessing steps beyond the provided scripts.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

10 stars in the last 30 days