Training-free method for correcting hallucinations in multimodal LLMs
Top 52.9% on sourcepulse
Woodpecker addresses the critical issue of hallucination in Multimodal Large Language Models (MLLMs), where generated text contradicts image content. It offers a training-free, post-hoc correction method for researchers and developers working with MLLMs, aiming to improve the factual accuracy and reliability of multimodal outputs.
How It Works
Woodpecker employs a five-stage pipeline: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction. This modular, post-remedy approach allows it to be easily integrated with various MLLMs without retraining. The staged process also provides interpretability by exposing intermediate outputs.
Quick Start & Requirements
pip install -r requirements.txt
en_core_web_lg
, en_core_web_md
, en_core_web_sm
models, and GroundingDINO.python inference.py --image-path ... --query ... --text ...
. Demo setup requires modifying gradio_demo.py
and running CUDA_VISIBLE_DEVICES=0,1 python gradio_demo.py
.Highlighted Details
Maintenance & Community
The project acknowledges contributions from mPLUG-Owl, GroundingDINO, BLIP-2, and LLaMA-Adapter. Contact emails (bradyfu24@gmail.com) and WeChat ID (xjtupanda) are provided for questions.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The README does not specify the license, which could impact commercial adoption. The project relies on external models like GroundingDINO, and its performance may be dependent on the quality of these dependencies.
7 months ago
1 day