Instruction tuning research paper for mitigating hallucination in large multimodal models
Top 93.1% on sourcepulse
This repository provides LRV-Instruction, a dataset and fine-tuning methodology to mitigate hallucinations in Large Multi-Modal Models (LMMs). It targets researchers and developers working with LMMs, offering a way to improve model robustness and accuracy by training on both positive and negative (hallucination-inducing) instructions.
How It Works
LRV-Instruction utilizes a dataset of 320k visual instructions, including negative examples designed to expose and correct model hallucinations. The approach involves robust instruction tuning, where models are trained on these diverse instruction types. This allows LMMs to learn to abstain from answering when uncertain or when presented with misleading information, thereby improving their reliability.
Quick Start & Requirements
environment.yml
, prepare Vicuna weights, download pretrained checkpoint, set dataset path, run demo.py
or inference.py
. Requires V100 32GB for training.model_worker.py
for LoRA integration, run serve.web_server
or serve.inference
. Requires V100 for training.Highlighted Details
Maintenance & Community
The project is associated with multiple accepted papers at top-tier conferences (ICLR 2024, CVPR 2024, NAACL 2024). Links to demos and the project page are provided.
Licensing & Compatibility
The repository is licensed under the BSD 3-Clause License. Code is based on MiniGPT4 and mplug-Owl, which also use the BSD 3-Clause License. This license is permissive and generally compatible with commercial use.
Limitations & Caveats
The README mentions future plans to release checkpoints for MiniGPT4-13B and LLaVA. The provided demos might not always work, with an email provided for support.
1 year ago
1 week