Woodpecker  by VITA-MLLM

Training-free method for correcting hallucinations in multimodal LLMs

created 1 year ago
640 stars

Top 52.9% on sourcepulse

GitHubView on GitHub
Project Summary

Woodpecker addresses the critical issue of hallucination in Multimodal Large Language Models (MLLMs), where generated text contradicts image content. It offers a training-free, post-hoc correction method for researchers and developers working with MLLMs, aiming to improve the factual accuracy and reliability of multimodal outputs.

How It Works

Woodpecker employs a five-stage pipeline: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction. This modular, post-remedy approach allows it to be easily integrated with various MLLMs without retraining. The staged process also provides interpretability by exposing intermediate outputs.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python 3.10, spaCy with en_core_web_lg, en_core_web_md, en_core_web_sm models, and GroundingDINO.
  • Usage: Run inference via python inference.py --image-path ... --query ... --text .... Demo setup requires modifying gradio_demo.py and running CUDA_VISIBLE_DEVICES=0,1 python gradio_demo.py.
  • Links: arXiv Paper, Demo, GroundingDINO, spaCy.

Highlighted Details

  • Achieves significant accuracy improvements on POPE benchmark (30.66%/24.33% over baselines).
  • Evaluated on LLaVA, mPLUG-Owl, Otter, and MiniGPT-4.
  • Proposes new open-ended evaluation metrics (accuracy, detailedness) using GPT-4V.
  • Offers interpretability through intermediate outputs.

Maintenance & Community

The project acknowledges contributions from mPLUG-Owl, GroundingDINO, BLIP-2, and LLaMA-Adapter. Contact emails (bradyfu24@gmail.com) and WeChat ID (xjtupanda) are provided for questions.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the license, which could impact commercial adoption. The project relies on external models like GroundingDINO, and its performance may be dependent on the quality of these dependencies.

Health Check
Last commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.