LRV-Instruction by FuxiaoLiu

Instruction tuning research paper for mitigating hallucination in large multimodal models

Created 2 years ago

294 stars

Top 90.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Edward Sun

Research Scientist at Meta Superintelligence Lab

Project Summary

This repository provides LRV-Instruction, a dataset and fine-tuning methodology to mitigate hallucinations in Large Multi-Modal Models (LMMs). It targets researchers and developers working with LMMs, offering a way to improve model robustness and accuracy by training on both positive and negative (hallucination-inducing) instructions.

How It Works

LRV-Instruction utilizes a dataset of 320k visual instructions, including negative examples designed to expose and correct model hallucinations. The approach involves robust instruction tuning, where models are trained on these diverse instruction types. This allows LMMs to learn to abstain from answering when uncertain or when presented with misleading information, thereby improving their reliability.

Quick Start & Requirements

LRV-Instruction V1 (MiniGPT4-7B): Clone repo, install environment.yml, prepare Vicuna weights, download pretrained checkpoint, set dataset path, run demo.py or inference.py. Requires V100 32GB for training.
LRV-Instruction V2 (Mplug-Owl-7B): Follow mplug-owl setup, download mplug-owl checkpoint and LoRA weights, modify model_worker.py for LoRA integration, run serve.web_server or serve.inference. Requires V100 for training.
Dataset: Available on Hugging Face and via direct links. Visual Genome images are required.
Evaluation: Uses GPT4-Assisted Visual Instruction Evaluation (GAVIE) with Visual Genome annotations.

Highlighted Details

Achieves state-of-the-art results on the MME benchmark when fine-tuned on mplug-owl.
Introduces GAVIE, a GPT-4 based evaluation framework for LMM hallucination.
Dataset includes negative instructions with non-existent and existent element manipulation.
Offers model checkpoints for MiniGPT4-7B and mplug-owl-7B.

Maintenance & Community

The project is associated with multiple accepted papers at top-tier conferences (ICLR 2024, CVPR 2024, NAACL 2024). Links to demos and the project page are provided.

Licensing & Compatibility

The repository is licensed under the BSD 3-Clause License. Code is based on MiniGPT4 and mplug-Owl, which also use the BSD 3-Clause License. This license is permissive and generally compatible with commercial use.

Limitations & Caveats

The README mentions future plans to release checkpoints for MiniGPT4-13B and LLaVA. The provided demos might not always work, with an email provided for support.

Health Check

Last Commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days