Otter by EvolvingLMMs-Lab

Multimodal model for improved instruction following and in-context learning

Created 2 years ago

3,283 stars

Top 14.6% on SourcePulse

View on GitHub

8 Experts Love This Project

Jiaming Song

Chief Scientist at Luma AI

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Paras Jain

Cofounder of Genmo

Amanpreet Singh

Cofounder of Contextual AI

and 4 more!

Project Summary

Otter is a multi-modal large language model (LMM) designed for instruction following and in-context learning with images and videos. It is based on the OpenFlamingo architecture and trained on the MIMIC-IT dataset, offering an open-source alternative for researchers and developers working with vision-language tasks.

How It Works

Otter leverages the Flamingo architecture, which excels at processing multiple interleaved image and text inputs. It is trained using an in-context instruction tuning methodology on the MIMIC-IT dataset, which comprises 2.8 million instruction-response pairs. This approach enables Otter to understand and respond to natural language instructions related to visual content, including complex reasoning and multi-round conversations.

Quick Start & Requirements

Install: conda env create -f environment.yml
Prerequisites: PyTorch matching CUDA version (e.g., CUDA 11.7 with Torch 2.0.0), transformers>=4.28.0, accelerate>=0.18.0. Requires at least 16GB GPU memory for local execution.
Resources: Official Huggingface integration available.
Docs: MIMIC-IT Dataset README, Run Otter Locally

Highlighted Details

Introduces OtterHD, a fine-tuned version of Fuyu-8B for high-resolution image interpretation without an explicit vision encoder.
Supports multiple interleaved image/video inputs, a novel feature for instruction-tuned LMMs.
Includes MagnifierBench for evaluating the identification of small objects and spatial relationships.
Provides training scripts for various LMMs (OpenFlamingo, Idefics, Fuyu) and datasets (MIMIC-IT, M3IT, LLAVAR).

Maintenance & Community

Actively updated with new models (OtterHD) and benchmarks (MagnifierBench).
Welcomes suggestions and PRs for code improvement.
Contact available for custom scenario development.

Licensing & Compatibility

The project itself appears to be under a permissive license, but specific model weights and datasets may have different terms. The README does not explicitly state the license for the code.

Limitations & Caveats

The code is noted as potentially not perfectly polished.
Previous code versions may not be runnable due to major changes in dataset organization.
Requires careful environment setup to match CUDA and PyTorch versions.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days