Image captioning research paper (CVPR 2022)
Top 83.0% on sourcepulse
VisualGPT offers a data-efficient approach to image captioning by adapting pretrained language models, specifically GPT-2, as a decoder. This method targets researchers and practitioners in computer vision and natural language processing looking to leverage large language models for visual tasks with reduced data requirements.
How It Works
VisualGPT frames image captioning as a conditional language generation problem. It utilizes a pretrained GPT-2 model, treating it as a decoder that takes visual features as input. The core innovation lies in its data-efficient adaptation strategy, allowing effective fine-tuning of the large language model on image captioning tasks with significantly less data than traditional methods.
Quick Start & Requirements
environment.yml
. Activate the environment with conda activate visualgpt
.python -m spacy download en
), and COCO dataset annotations and detections (coco_detections.hdf5
).python train_visualGPT.py --batch_size 50 --head 12 --tau 0.2 --features_path coco_detections.hdf5 --annotation_folder annotations --lr 1e-4 --gpt_model_type gpt --random_seed 42 --log_file logs/log --exp_name experiment_log --decoder_layer 12 --optimizer_type adamw --gradient_accumulation_steps 2 --train_percentage 0.001 --split_train_data
Highlighted Details
Maintenance & Community
The project is associated with the CVPR 2022 conference. No specific community channels or active maintenance indicators are provided in the README.
Licensing & Compatibility
The README does not explicitly state a license. The project acknowledges resources from Meshed Memory Transformer and Transformers, implying potential licensing considerations from those projects. Commercial use compatibility is not specified.
Limitations & Caveats
The provided training script uses a train_percentage
of 0.001, indicating it's designed for demonstration or fine-tuning on a small subset of data. The setup requires downloading specific model weights and dataset files, which may be substantial.
2 years ago
1 day