Decoding method for multimodal LLMs, addressing hallucinations (CVPR 2024)
Top 80.0% on sourcepulse
OPERA is a decoding method designed to mitigate hallucination in multi-modal large language models (MLLMs). It targets researchers and practitioners working with MLLMs who need to improve the factual accuracy of generated content without requiring additional training data or external knowledge sources. The primary benefit is a "nearly free lunch" approach to reducing hallucinations.
How It Works
OPERA operates by observing that MLLM hallucinations often correlate with self-attention patterns where the model over-focuses on a few summary tokens, neglecting others. It introduces a penalty term during beam-search decoding to discourage this "over-trust" in specific tokens. Additionally, a "retrospection-allocation" strategy rolls back to previously generated tokens if summary tokens are detected, re-allocating token selection to improve accuracy.
Quick Start & Requirements
conda env create -f environment.yml
), activating it (conda activate opera
), and installing a modified transformers package (python -m pip install -e transformers-4.29.2
).Highlighted Details
transformers.generation.utils.py
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
transformers-4.29.2
), requiring manual adaptation for other versions.11 months ago
Inactive