OPERA  by shikiw

Decoding method for multimodal LLMs, addressing hallucinations (CVPR 2024)

created 1 year ago
353 stars

Top 80.0% on sourcepulse

GitHubView on GitHub
Project Summary

OPERA is a decoding method designed to mitigate hallucination in multi-modal large language models (MLLMs). It targets researchers and practitioners working with MLLMs who need to improve the factual accuracy of generated content without requiring additional training data or external knowledge sources. The primary benefit is a "nearly free lunch" approach to reducing hallucinations.

How It Works

OPERA operates by observing that MLLM hallucinations often correlate with self-attention patterns where the model over-focuses on a few summary tokens, neglecting others. It introduces a penalty term during beam-search decoding to discourage this "over-trust" in specific tokens. Additionally, a "retrospection-allocation" strategy rolls back to previously generated tokens if summary tokens are detected, re-allocating token selection to improve accuracy.

Quick Start & Requirements

  • Install by creating a conda environment (conda env create -f environment.yml), activating it (conda activate opera), and installing a modified transformers package (python -m pip install -e transformers-4.29.2).
  • Requires PyTorch and specific versions of the Hugging Face Transformers library.
  • Evaluation requires the MSCOCO 2014 dataset and specific pre-trained model checkpoints (LLaVA-1.5, Vicuna, MiniGPT-4, Shikra).
  • Official documentation and demo notebooks are available.

Highlighted Details

  • Implemented as a modification within transformers.generation.utils.py.
  • Supports multiple MLLMs including InstructBLIP, MiniGPT-4, LLaVA-1.5, and Shikra.
  • Provides evaluation scripts for POPE, CHAIR, and GPT-4V benchmarks.
  • Achieves high accuracy (e.g., 90.3% on InstructBLIP 7B for POPE random split).

Maintenance & Community

  • Based on LAVIS and MiniGPT-4 codebases.
  • Citation provided for the CVPR 2024 paper.

Licensing & Compatibility

  • The README does not explicitly state the license. However, the project is based on LAVIS and MiniGPT-4, which are typically released under permissive licenses like Apache 2.0 or MIT. Compatibility for commercial use would depend on the specific license chosen for this repository.

Limitations & Caveats

  • The core implementation is tied to a specific version of the Transformers library (transformers-4.29.2), requiring manual adaptation for other versions.
  • Evaluation setup involves downloading large datasets and multiple specific model checkpoints.
Health Check
Last commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
21 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

HALOs by ContextualAI

0.2%
873
Library for aligning LLMs using human-aware loss functions
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.