open_flamingo  by mlfoundations

Open-source framework for training large multimodal models

created 2 years ago
3,988 stars

Top 12.5% on sourcepulse

GitHubView on GitHub
Project Summary

OpenFlamingo provides an open-source PyTorch implementation for training and evaluating large multimodal models, inspired by DeepMind's Flamingo. It enables rapid adaptation to new tasks via in-context learning for researchers and practitioners working with vision-language tasks like image captioning or visual question answering.

How It Works

OpenFlamingo integrates pretrained vision encoders (e.g., OpenCLIP) with pretrained language models (e.g., MPT, LLaMA) using cross-attention layers. This architecture allows the model to condition text generation on interleaved image and text inputs, facilitating few-shot learning capabilities by adapting to new tasks with minimal examples.

Quick Start & Requirements

  • Install: pip install open-flamingo[all] or conda env create -f environment.yml
  • Prerequisites: PyTorch, Hugging Face Transformers, OpenCLIP. Specific language and vision models may have additional requirements.
  • Resources: Requires significant compute for training; inference requirements depend on model size.
  • Links: Paper, Blog Posts, Demo

Highlighted Details

  • Supports various pretrained vision encoders (OpenCLIP) and language models (MPT, LLaMA, OPT, etc.).
  • Offers multiple released model checkpoints (3B to 9B parameters) with benchmark results on COCO and VQAv2.
  • Provides example scripts for training and evaluation.
  • Enables text generation conditioned on interleaved image and text inputs.

Maintenance & Community

  • Developed by a team from University of Washington, Stanford, AI2, UCSB, and Google.
  • Codebase is based on Lucidrains' flamingo implementation and David Hansmair's flamingo-mini.
  • Open to contributions and questions via GitHub issues.

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatibility: Permissive license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

  • The README notes that a previous LLaMA-based checkpoint has been deprecated in favor of the v2 release.
  • Training MPT-1B models may require a modified version due to specific kwargs handling.
Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
95 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.