open_flamingo  by mlfoundations

Open-source framework for training large multimodal models

Created 2 years ago
4,010 stars

Top 12.3% on SourcePulse

GitHubView on GitHub
Project Summary

OpenFlamingo provides an open-source PyTorch implementation for training and evaluating large multimodal models, inspired by DeepMind's Flamingo. It enables rapid adaptation to new tasks via in-context learning for researchers and practitioners working with vision-language tasks like image captioning or visual question answering.

How It Works

OpenFlamingo integrates pretrained vision encoders (e.g., OpenCLIP) with pretrained language models (e.g., MPT, LLaMA) using cross-attention layers. This architecture allows the model to condition text generation on interleaved image and text inputs, facilitating few-shot learning capabilities by adapting to new tasks with minimal examples.

Quick Start & Requirements

  • Install: pip install open-flamingo[all] or conda env create -f environment.yml
  • Prerequisites: PyTorch, Hugging Face Transformers, OpenCLIP. Specific language and vision models may have additional requirements.
  • Resources: Requires significant compute for training; inference requirements depend on model size.
  • Links: Paper, Blog Posts, Demo

Highlighted Details

  • Supports various pretrained vision encoders (OpenCLIP) and language models (MPT, LLaMA, OPT, etc.).
  • Offers multiple released model checkpoints (3B to 9B parameters) with benchmark results on COCO and VQAv2.
  • Provides example scripts for training and evaluation.
  • Enables text generation conditioned on interleaved image and text inputs.

Maintenance & Community

  • Developed by a team from University of Washington, Stanford, AI2, UCSB, and Google.
  • Codebase is based on Lucidrains' flamingo implementation and David Hansmair's flamingo-mini.
  • Open to contributions and questions via GitHub issues.

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatibility: Permissive license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

  • The README notes that a previous LLaMA-based checkpoint has been deprecated in favor of the v2 release.
  • Training MPT-1B models may require a modified version due to specific kwargs handling.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
21 stars in the last 30 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

METER by zdou0830

0%
373
Multimodal framework for vision-and-language transformer research
Created 3 years ago
Updated 2 years ago
Starred by Jiaming Song Jiaming Song(Chief Scientist at Luma AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

Otter by EvolvingLMMs-Lab

0.0%
3k
Multimodal model for improved instruction following and in-context learning
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.