Awesome-Prompting-on-Vision-Language-Model  by JindongGu

Survey paper for vision-language model prompt engineering

created 2 years ago
475 stars

Top 65.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a curated collection of research papers on prompt engineering for Vision-Language Models (VLMs). It targets researchers and practitioners in AI and computer vision, offering a structured overview of techniques for adapting VLMs to various tasks like multimodal-to-text generation, image-text matching, and text-to-image synthesis. The primary benefit is a centralized, categorized resource for understanding the evolving landscape of VLM prompting.

How It Works

The repository categorizes prompting methods into "hard prompts" (task instructions, in-context learning, retrieval-based, chain-of-thought) and "soft prompts" (prompt tuning, prefix tuning), focusing on techniques that do not alter the base VLM architecture. It covers three main VLM types: multimodal-to-text generation (e.g., Flamingo), image-text matching (e.g., CLIP), and text-to-image generation (e.g., Stable Diffusion). The papers are organized by VLM type and prompting category, providing titles, venues, years, and code availability.

Quick Start & Requirements

This repository is a curated list of papers and does not have direct installation or execution requirements. It serves as a reference guide.

Highlighted Details

  • Comprehensive coverage of prompting techniques across three major VLM categories.
  • Detailed categorization of prompting methods (hard vs. soft, sub-categories).
  • Links to code repositories are provided where available for many listed papers.
  • Includes papers on applications, responsible AI, adversarial attacks, and bias in VLMs.

Maintenance & Community

The repository is maintained by Jindong Gu and Shuo Chen, with contact information provided for contributions, corrections, and suggestions. The primary reference is the survey paper "A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models."

Licensing & Compatibility

The repository itself does not specify a license. The licensing of the individual papers and their associated code would need to be checked on a per-paper basis.

Limitations & Caveats

The repository is a static list of papers and does not provide executable code or models. The rapidly evolving nature of the field means new research may not be immediately incorporated.

Health Check
Last commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
23 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.