prophet by MILVLG

VQA framework using answer heuristics to prompt LLMs

Created 3 years ago

277 stars

Top 93.6% on SourcePulse

Project Summary

This repository provides the official implementation for Prophet, a two-stage framework for knowledge-based Visual Question Answering (VQA). It targets researchers and practitioners in computer vision and NLP, enabling improved VQA performance by leveraging GPT-3 with answer heuristics derived from a trained VQA model.

How It Works

Prophet employs a two-stage approach. Stage one involves training a standard VQA model (MCAN) on a specific dataset, from which answer heuristics—candidate answers and answer-aware examples—are extracted. Stage two utilizes these heuristics to prompt GPT-3, guiding it to generate more accurate answers. This method significantly outperforms existing state-of-the-art on OK-VQA and A-OKVQA datasets.

Quick Start & Requirements

Installation: Create a conda environment using conda env create -f environment.yml.
Prerequisites: Python >= 3.9, CUDA >= 11.3, PyTorch >= 12.0.
Hardware: Recommended: 1x RTX 3090 GPU, 50GB RAM, 300GB disk space (SSD recommended).
Data: Download MSCOCO 2014/2017 datasets and run bash scripts/extract_img_feats.sh.
Documentation: Usage details and scripts are available in the scripts directory.

Highlighted Details

Achieves 61.1% accuracy on OK-VQA and 55.7% on A-OKVQA.
Utilizes an MCAN model for initial VQA and heuristic extraction.
Prompts GPT-3 with generated answer candidates and examples.
Provides pre-trained and fine-tuned models for OK-VQA and A-OKVQA.

Maintenance & Community

The project is associated with the CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering."
Updates were made in April and March 2023.

Licensing & Compatibility

Licensed under the Apache License 2.0.
Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The framework requires significant computational resources and disk space for data preparation and model training. It also relies on access to the OpenAI GPT-3 API, which may incur costs.

prophet by MILVLG

Explore Similar Projects

ChatKBQA by LHRLAB

chain-of-draft by sileix

gpqa by idavidrein

MMLU-Pro by TIGER-AI-Lab

GAOKAO-Bench by OpenLMLab

rag-demystified by pchunduri6

primeqa by primeqa

AutoDidact by dCaples

DAIL-SQL by BeachWang

auto-evaluator by rlancemartin

chain-of-thought-hub by FranxYao

question_generation by patil-suraj