RLHF-V  by RLHF-V

CVPR'24 research on aligning MLLMs via fine-grained human feedback

created 1 year ago
287 stars

Top 92.3% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

RLHF-V provides a framework for aligning Multimodal Large Language Models (MLLMs) using fine-grained human feedback to reduce hallucinations. It targets researchers and developers aiming to improve MLLM trustworthiness, offering a data-efficient method to enhance model behavior.

How It Works

The framework leverages fine-grained correctional human feedback, where annotators correct hallucinated segments in MLLM responses. This approach prioritizes data efficiency, allowing for significant hallucination rate reduction with minimal training time. The core methodology involves aligning MLLM behavior through this targeted feedback, enhancing reliability.

Quick Start & Requirements

  • Install: Clone the repository and set up a conda environment (conda create -n muffin python=3.10, conda activate muffin). Install dependencies via pip install -e .. Specific versions of transformers and flash-attention are recommended for reproducibility.
  • Prerequisites: Python 3.10, CUDA (for flash-attention), COCO2014 dataset annotations for Object HalBench evaluation.
  • Resources: Training requires 8 A100 GPUs for 1 hour to achieve a 34.8% hallucination rate reduction.
  • Links: Project page, paper, demo.

Highlighted Details

  • Achieves a 34.8% hallucination rate reduction in 1 hour on 8 A100 GPUs.
  • RLHF-V models have ranked #1 on MMHal-Bench among open-source models and outperform GPT-4V on Object HalBench.
  • Supports evaluation on LLaVA Bench, Object HalBench, and MMHal Bench.
  • Offers a larger, diverse dataset of 5.7k fine-grained human correction data.

Maintenance & Community

The project is associated with THU NLP and has seen contributions and integrations with other models like MiniCPM-V 2.0 and OmniLMM-12B. Updates are regularly posted on Hugging Face and arXiv.

Licensing & Compatibility

  • License: BSD for code, CC BY NC 4.0 for the dataset.
  • Restrictions: Data and models are licensed for research use only and are subject to the licenses of underlying models (LLaMA, Vicuna, Chat GPT). Commercial use is prohibited due to the CC BY NC 4.0 license.

Limitations & Caveats

The dataset and models are strictly for research purposes and non-commercial use. Compatibility is tied to the licenses of the base models used.

Health Check
Last commit

10 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
13 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Woosuk Kwon Woosuk Kwon(Author of vLLM), and
11 more.

WizardLM by nlpxucan

0.1%
9k
LLMs built using Evol-Instruct for complex instruction following
created 2 years ago
updated 1 month ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Calvin French-Owen Calvin French-Owen(Coounder of Segment), and
12 more.

StableLM by Stability-AI

0.0%
16k
Language models by Stability AI
created 2 years ago
updated 1 year ago
Feedback? Help us improve.