Discover and explore top open-source AI tools and projects—updated daily.
xytian1008Attribute-guided prompt tuning for vision-language models
Top 72.7% on SourcePulse
ArGue enhances soft prompt tuning for Vision-Language Models (VLMs) by mitigating distribution shift and spurious correlations. It targets researchers and practitioners seeking improved performance in novel class prediction and out-of-distribution generalization tasks, offering a method to align VLMs more robustly with visual concepts.
How It Works
ArGue introduces three core components to soft prompt tuning. "Attribute-Guided Prompting" augments prompts with LLM-generated visual attributes ([soft tokens] + [class name] + [attribute]). "Attribute Sampling" refines this by clustering attributes semantically and selecting the most visually relevant ones (N=3 per class) based on CLIP text features and training image similarity, significantly reducing computational overhead while filtering irrelevant attributes. "Negative Prompting" (ArGue-N) further suppresses spurious correlations, particularly background cues, by training the model to output uniform distributions under specifically crafted negative prompts.
Quick Start & Requirements
pip install -r requirements.txt. Install the dassl library separately following its official instructions.torch, dassl, and clip. Attribute generation requires access to the GPT-3 API.python generate_descriptors.py to generate attributes via GPT-3.bash scripts/ARGUE/select_attr.sh to cluster attributes and select representative ones.base2new_train.sh, base2new_test.sh) and OOD generalization (xd_train.sh, xd_test.sh).Highlighted Details
Maintenance & Community
The project builds upon established frameworks like CoOp/CoCoOp and utilizes the Dassl training framework and CLIP backbone. Attribute generation relies on GPT-3. No specific community channels (e.g., Discord, Slack) or roadmap details are provided in the README.
Licensing & Compatibility
This project is licensed under the MIT License, which is permissive and generally compatible with commercial use and closed-source linking.
Limitations & Caveats
Attribute generation requires access to the GPT-3 API, which may incur costs. Dataset preparation relies on external instructions from the CoOp project. The code release follows the paper's acceptance at CVPR 2024.
1 month ago
Inactive
ContextualAI
LAION-AI
xinyu1205