multimodal-prompt-learning by muzairkhattak

Research paper on multimodal prompt learning for vision-language models

Created 3 years ago

795 stars

Top 44.3% on SourcePulse

Project Summary

This repository provides the official implementation for MaPLe (Multi-modal Prompt Learning), a CVPR 2023 paper. It addresses the sub-optimality of adapting only one modality (vision or language) in CLIP-like models by proposing a method that learns prompts for both branches simultaneously, fostering synergy and improving generalization to novel classes and unseen domain shifts. The target audience includes researchers and practitioners working with vision-language models seeking enhanced performance and adaptability.

How It Works

MaPLe learns prompts for both the vision and language branches of CLIP, explicitly conditioning vision prompts on their language counterparts. This coupling allows for mutual propagation of gradients, promoting synergy between modalities. Furthermore, it employs deep prompting, learning multi-modal prompts across multiple transformer blocks in both branches to progressively capture synergistic behavior and rich context. This approach aims to overcome the limitations of uni-modal prompting by enabling dynamic adjustment of both representation spaces.

Quick Start & Requirements

Installation instructions are detailed in INSTALL.md.
Data preparation instructions are in DATASETS.md.
Training and evaluation instructions are in RUN.md.
Requires specific dataset preparation and potentially large pre-trained models.

Highlighted Details

Achieves an absolute gain of 3.45% on novel classes and 2.72% on overall harmonic-mean compared to Co-CoOp, averaged over 11 diverse image recognition datasets.
Supports MaPLe, CoOp, Co-CoOp, Deep Vision Prompting, Deep Language Prompting, and Independent V-L Prompting architectures.
Pretrained models and training/evaluation codes are released.
Code is based on the Co-CoOp and CoOp repositories.

Maintenance & Community

The project is associated with CVPR 2023 and ICCV 2023 submissions.
Contact information for questions is provided via email and GitHub issues.

Licensing & Compatibility

The README does not explicitly state a license.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not detail specific limitations, unsupported platforms, or known bugs. The project is presented as an official implementation of a published research paper.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days