OpenFlamingo provides an open-source PyTorch implementation for training and evaluating large multimodal models, inspired by DeepMind's Flamingo. It enables rapid adaptation to new tasks via in-context learning for researchers and practitioners working with vision-language tasks like image captioning or visual question answering.
How It Works
OpenFlamingo integrates pretrained vision encoders (e.g., OpenCLIP) with pretrained language models (e.g., MPT, LLaMA) using cross-attention layers. This architecture allows the model to condition text generation on interleaved image and text inputs, facilitating few-shot learning capabilities by adapting to new tasks with minimal examples.
Quick Start & Requirements
- Install:
pip install open-flamingo[all]
or conda env create -f environment.yml
- Prerequisites: PyTorch, Hugging Face Transformers, OpenCLIP. Specific language and vision models may have additional requirements.
- Resources: Requires significant compute for training; inference requirements depend on model size.
- Links: Paper, Blog Posts, Demo
Highlighted Details
- Supports various pretrained vision encoders (OpenCLIP) and language models (MPT, LLaMA, OPT, etc.).
- Offers multiple released model checkpoints (3B to 9B parameters) with benchmark results on COCO and VQAv2.
- Provides example scripts for training and evaluation.
- Enables text generation conditioned on interleaved image and text inputs.
Maintenance & Community
- Developed by a team from University of Washington, Stanford, AI2, UCSB, and Google.
- Codebase is based on Lucidrains' flamingo implementation and David Hansmair's flamingo-mini.
- Open to contributions and questions via GitHub issues.
Licensing & Compatibility
- License: Apache 2.0.
- Compatibility: Permissive license allows for commercial use and integration with closed-source projects.
Limitations & Caveats
- The README notes that a previous LLaMA-based checkpoint has been deprecated in favor of the v2 release.
- Training MPT-1B models may require a modified version due to specific
kwargs
handling.