Research paper for unsupervised prompt distillation in vision-language models
Top 85.5% on sourcepulse
PromptKD is an unsupervised framework for distilling knowledge from large Vision-Language Models (VLMs) to smaller target models using unlabeled domain images. It is designed for researchers and practitioners working with VLMs who need to adapt powerful models to specific domains efficiently without requiring labeled data for the target domain. The primary benefit is achieving strong performance on downstream tasks with a lightweight student model by leveraging a pre-trained, high-quality teacher model.
How It Works
PromptKD employs a novel two-stage unsupervised prompt distillation approach. First, it utilizes a pre-trained, large CLIP teacher model to generate soft labels for unlabeled domain images. Second, it distills this knowledge into a lightweight student model by having the student mimic the teacher's output, specifically by reusing the teacher's high-quality text features as shared class vectors. This method avoids training a separate text encoder for the student, making the process more efficient and effective.
Quick Start & Requirements
Dassl.pytorch
library. Instructions are in INSTALL.md
.DATASETS.md
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
4 days ago
1 day