VQA framework using answer heuristics to prompt LLMs
Top 94.7% on sourcepulse
This repository provides the official implementation for Prophet, a two-stage framework for knowledge-based Visual Question Answering (VQA). It targets researchers and practitioners in computer vision and NLP, enabling improved VQA performance by leveraging GPT-3 with answer heuristics derived from a trained VQA model.
How It Works
Prophet employs a two-stage approach. Stage one involves training a standard VQA model (MCAN) on a specific dataset, from which answer heuristics—candidate answers and answer-aware examples—are extracted. Stage two utilizes these heuristics to prompt GPT-3, guiding it to generate more accurate answers. This method significantly outperforms existing state-of-the-art on OK-VQA and A-OKVQA datasets.
Quick Start & Requirements
conda env create -f environment.yml
.bash scripts/extract_img_feats.sh
.scripts
directory.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The framework requires significant computational resources and disk space for data preparation and model training. It also relies on access to the OpenAI GPT-3 API, which may incur costs.
1 month ago
1 week