Woodpecker by VITA-MLLM

Training-free method for correcting hallucinations in multimodal LLMs

Created 2 years ago

643 stars

Top 51.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Gabriel Almeida

Cofounder of Langflow

Project Summary

Woodpecker addresses the critical issue of hallucination in Multimodal Large Language Models (MLLMs), where generated text contradicts image content. It offers a training-free, post-hoc correction method for researchers and developers working with MLLMs, aiming to improve the factual accuracy and reliability of multimodal outputs.

How It Works

Woodpecker employs a five-stage pipeline: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction. This modular, post-remedy approach allows it to be easily integrated with various MLLMs without retraining. The staged process also provides interpretability by exposing intermediate outputs.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python 3.10, spaCy with en_core_web_lg, en_core_web_md, en_core_web_sm models, and GroundingDINO.
Usage: Run inference via python inference.py --image-path ... --query ... --text .... Demo setup requires modifying gradio_demo.py and running CUDA_VISIBLE_DEVICES=0,1 python gradio_demo.py.
Links: arXiv Paper, Demo, GroundingDINO, spaCy.

Highlighted Details

Achieves significant accuracy improvements on POPE benchmark (30.66%/24.33% over baselines).
Evaluated on LLaVA, mPLUG-Owl, Otter, and MiniGPT-4.
Proposes new open-ended evaluation metrics (accuracy, detailedness) using GPT-4V.
Offers interpretability through intermediate outputs.

Maintenance & Community

The project acknowledges contributions from mPLUG-Owl, GroundingDINO, BLIP-2, and LLaMA-Adapter. Contact emails (bradyfu24@gmail.com) and WeChat ID (xjtupanda) are provided for questions.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the license, which could impact commercial adoption. The project relies on external models like GroundingDINO, and its performance may be dependent on the quality of these dependencies.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days