OPERA by shikiw

Decoding method for multimodal LLMs, addressing hallucinations (CVPR 2024)

Created 2 years ago

391 stars

Top 73.5% on SourcePulse

Project Summary

OPERA is a decoding method designed to mitigate hallucination in multi-modal large language models (MLLMs). It targets researchers and practitioners working with MLLMs who need to improve the factual accuracy of generated content without requiring additional training data or external knowledge sources. The primary benefit is a "nearly free lunch" approach to reducing hallucinations.

How It Works

OPERA operates by observing that MLLM hallucinations often correlate with self-attention patterns where the model over-focuses on a few summary tokens, neglecting others. It introduces a penalty term during beam-search decoding to discourage this "over-trust" in specific tokens. Additionally, a "retrospection-allocation" strategy rolls back to previously generated tokens if summary tokens are detected, re-allocating token selection to improve accuracy.

Quick Start & Requirements

Install by creating a conda environment (conda env create -f environment.yml), activating it (conda activate opera), and installing a modified transformers package (python -m pip install -e transformers-4.29.2).
Requires PyTorch and specific versions of the Hugging Face Transformers library.
Evaluation requires the MSCOCO 2014 dataset and specific pre-trained model checkpoints (LLaVA-1.5, Vicuna, MiniGPT-4, Shikra).
Official documentation and demo notebooks are available.

Highlighted Details

Implemented as a modification within transformers.generation.utils.py.
Supports multiple MLLMs including InstructBLIP, MiniGPT-4, LLaVA-1.5, and Shikra.
Provides evaluation scripts for POPE, CHAIR, and GPT-4V benchmarks.
Achieves high accuracy (e.g., 90.3% on InstructBLIP 7B for POPE random split).

Maintenance & Community

Based on LAVIS and MiniGPT-4 codebases.
Citation provided for the CVPR 2024 paper.

Licensing & Compatibility

The README does not explicitly state the license. However, the project is based on LAVIS and MiniGPT-4, which are typically released under permissive licenses like Apache 2.0 or MIT. Compatibility for commercial use would depend on the specific license chosen for this repository.

Limitations & Caveats

The core implementation is tied to a specific version of the Transformers library (transformers-4.29.2), requiring manual adaptation for other versions.
Evaluation setup involves downloading large datasets and multiple specific model checkpoints.

Health Check

Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

6 stars in the last 30 days

Explore Similar Projects

VardaGPT by ixaxaar

Associative memory-enhanced GPT-2 model

Created 2 years ago

Updated 2 years ago

Starred by

Edward Sun

Edward Sun(Research Scientist at Meta Superintelligence Lab).

LRV-Instruction by FuxiaoLiu

Instruction tuning research paper for mitigating hallucination in large multimodal models

Created 2 years ago

Updated 1 year ago

BPO by thu-coai

Prompt optimizer for aligning LLMs without training

Created 2 years ago

Updated 1 year ago

RLHF by sunzeyeah

Chinese ChatGPT implementation, training/eval tools

Created 2 years ago

Updated 2 years ago

llava-phi by xmoanvaf

Multimodal assistant with small language models

Created 2 years ago

Updated 1 year ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Travis Fischer

Travis Fischer(Founder of Agentic), and

1 more.

HaluEval by RUCAIBox

Benchmark dataset for LLM hallucination evaluation

Created 2 years ago

Updated 1 year ago

Starred by

Yineng Zhang

Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

duo-attention by mit-han-lab

Framework for efficient long-context LLM inference

Created 1 year ago

Updated 11 months ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

megalodon by XuezheMax

Reference implementation for Megalodon 7B model

Created 1 year ago

Updated 7 months ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA) and

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

selfcheckgpt by potsawee

Hallucination detection research paper for generative LLMs using black-box methods

Created 2 years ago

Updated 1 year ago

Starred by

John Yang

John Yang(Coauthor of SWE-bench, SWE-agent),

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen), and

4 more.

unlimiformer by abertsch72

Research paper for long-range transformers with unlimited input

Created 2 years ago

Updated 1 year ago

Transformers-for-NLP-and-Computer-Vision-3rd-Edition by Denis2054

Code repo for exploring Generative AI and LLMs

Created 2 years ago

Updated 5 months ago

Starred by

Ying Sheng

Ying Sheng(Coauthor of SGLang).

Awesome-LLM-Inference by xlite-dev

Curated list of LLM/VLM inference research papers with code

Created 2 years ago

Updated 1 month ago

Feedback? Help us improve.