open_flamingo by mlfoundations

Open-source framework for training large multimodal models

Created 3 years ago

4,059 stars

Top 12.0% on SourcePulse

View on GitHub

12 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Wing Lian

Founder of Axolotl AI

Omar Sanseviero

DevRel at Google DeepMind

Jeff Hammerbacher

Cofounder of Cloudera

and 8 more!

Project Summary

OpenFlamingo provides an open-source PyTorch implementation for training and evaluating large multimodal models, inspired by DeepMind's Flamingo. It enables rapid adaptation to new tasks via in-context learning for researchers and practitioners working with vision-language tasks like image captioning or visual question answering.

How It Works

OpenFlamingo integrates pretrained vision encoders (e.g., OpenCLIP) with pretrained language models (e.g., MPT, LLaMA) using cross-attention layers. This architecture allows the model to condition text generation on interleaved image and text inputs, facilitating few-shot learning capabilities by adapting to new tasks with minimal examples.

Quick Start & Requirements

Install: pip install open-flamingo[all] or conda env create -f environment.yml
Prerequisites: PyTorch, Hugging Face Transformers, OpenCLIP. Specific language and vision models may have additional requirements.
Resources: Requires significant compute for training; inference requirements depend on model size.
Links: Paper, Blog Posts, Demo

Highlighted Details

Supports various pretrained vision encoders (OpenCLIP) and language models (MPT, LLaMA, OPT, etc.).
Offers multiple released model checkpoints (3B to 9B parameters) with benchmark results on COCO and VQAv2.
Provides example scripts for training and evaluation.
Enables text generation conditioned on interleaved image and text inputs.

Maintenance & Community

Developed by a team from University of Washington, Stanford, AI2, UCSB, and Google.
Codebase is based on Lucidrains' flamingo implementation and David Hansmair's flamingo-mini.
Open to contributions and questions via GitHub issues.

Licensing & Compatibility

License: Apache 2.0.
Compatibility: Permissive license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

The README notes that a previous LLaMA-based checkpoint has been deprecated in favor of the v2 release.
Training MPT-1B models may require a modified version due to specific kwargs handling.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

17 stars in the last 30 days