unified-io-2  by allenai

Unified-IO 2 code for training, inference, and demo

created 1 year ago
619 stars

Top 54.1% on sourcepulse

GitHubView on GitHub
Project Summary

Unified-IO 2 is a multimodal foundation model designed for researchers and practitioners working with diverse data types including vision, language, audio, and action. It offers a unified framework for training and inference across these modalities, building upon the T5X codebase and providing pre-trained checkpoints for various model sizes.

How It Works

Unified-IO 2 employs an autoregressive approach to handle multiple modalities within a single model. It leverages a sophisticated preprocessing pipeline that includes task-specific steps (resizing images, converting audio to mel-spectrograms), modality-general preprocessing (tokenization, handling missing modalities), and a feature converter to ensure consistent, fixed-size tensor outputs suitable for JAX. This unified data processing and model architecture allows for efficient cross-modal learning and generation.

Quick Start & Requirements

  • Install: pip install -e . (for GPU/CPU) or pip install -e '.[tpu]' (for TPU). Additional dependencies for demo: pip install -e '.[demo]'.
  • Prerequisites: Python 3.8+ recommended (Python 3.9 may have orbax.checkpoint conflicts). CUDA for GPU. LLaMa tokenizer model file required.
  • Checkpoints: Available on S3 (e.g., s3://ai2-prior-uio/public/uio2-checkpoints/large-3m). Download command example: aws s3 --no-sign-request cp --recursive s3://ai2-prior-uio/public/uio2-checkpoints/large-3m large-3m --exclude "state*".
  • Demo: Run jupyter notebook demo.ipynb after installing demo dependencies.
  • Docs: https://github.com/allenai/unified-io-2

Highlighted Details

  • Supports vision, language, audio, and action modalities.
  • Codebase is modified from T5X, enabling TPU training.
  • Offers checkpoints in T5X format for XXL, XL, and Large model sizes.
  • Includes data preprocessing scripts and visualization tools.
  • Training and evaluation scripts are provided, configurable via gin files.

Maintenance & Community

The project is from Allen Institute for AI (AI2). Specific community channels (Discord/Slack) or roadmap details are not explicitly mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. Given the affiliation with AI2 and the use of T5X, it is likely to be permissive, but users should verify. Compatibility with commercial or closed-source projects would require license confirmation.

Limitations & Caveats

GPU/CPU setups are noted as not well-tested, with a primary focus on TPUs. Python 3.9 may encounter compatibility issues with orbax.checkpoint and JAX. Some datasets require manual preprocessing steps beyond the provided scripts.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
13 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Zhuohan Li Zhuohan Li(Author of vLLM), and
6 more.

torchtitan by pytorch

0.9%
4k
PyTorch platform for generative AI model training research
created 1 year ago
updated 22 hours ago
Feedback? Help us improve.