Training-free method for mitigating hallucinations in LVLMs during decoding
Top 89.6% on sourcepulse
This repository provides Visual Contrastive Decoding (VCD), a training-free method to reduce object hallucinations in Large Vision-Language Models (LVLMs). It's designed for researchers and developers working with LVLMs who need to improve the factual accuracy of generated text without retraining models. VCD offers a simple integration to enhance existing LVLM pipelines.
How It Works
VCD operates by contrasting the output probability distributions derived from original and slightly distorted visual inputs. The core idea is to formulate a new decoding probability that down-weights tokens associated with spurious correlations in the original image while up-weighting those consistent across variations. This approach aims to mitigate over-reliance on statistical biases and unimodal priors, which are identified as key causes of object hallucinations.
Quick Start & Requirements
conda create -yn vcd python=3.9
conda activate vcd
cd VCD
pip install -r requirements.txt
vcd_utils.vcd_sample.evolve_vcd_sampling()
and passing images_cd
and cd_alpha
/cd_beta
parameters to the model.generate
function.Highlighted Details
Maintenance & Community
The project is associated with DAMO-NLP-SG. The paper is available on arXiv. Related projects include Contrastive Decoding, InstructBLIP, Qwen-VL, and LLaVA 1.5.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The README does not detail specific limitations, unsupported platforms, or known bugs. The integration requires modifying existing model generation code, which may introduce complexity.
10 months ago
1+ week