PromptKD  by zhengli97

Research paper for unsupervised prompt distillation in vision-language models

created 1 year ago
322 stars

Top 85.5% on sourcepulse

GitHubView on GitHub
Project Summary

PromptKD is an unsupervised framework for distilling knowledge from large Vision-Language Models (VLMs) to smaller target models using unlabeled domain images. It is designed for researchers and practitioners working with VLMs who need to adapt powerful models to specific domains efficiently without requiring labeled data for the target domain. The primary benefit is achieving strong performance on downstream tasks with a lightweight student model by leveraging a pre-trained, high-quality teacher model.

How It Works

PromptKD employs a novel two-stage unsupervised prompt distillation approach. First, it utilizes a pre-trained, large CLIP teacher model to generate soft labels for unlabeled domain images. Second, it distills this knowledge into a lightweight student model by having the student mimic the teacher's output, specifically by reusing the teacher's high-quality text features as shared class vectors. This method avoids training a separate text encoder for the student, making the process more efficient and effective.

Quick Start & Requirements

  • Installation: Install the Dassl.pytorch library. Instructions are in INSTALL.md.
  • Prerequisites:
    • PyTorch.
    • Pre-trained CLIP models (ViT-L/14 or ViT-B/16) from OpenAI are recommended.
    • Pre-trained teacher CLIP models are available via Baidu Yun, TeraBox, and Google Cloud.
    • Dataset preparation instructions are in DATASETS.md.
  • Setup: Requires downloading CLIP model weights and preparing datasets. Training a teacher model is optional but recommended for optimal results.
  • Links: Paper, Project Page, Poster

Highlighted Details

  • Outperforms existing prompt learning methods on 11 diverse recognition datasets.
  • Demonstrates strong generalization ability in base-to-novel and cross-dataset evaluations.
  • Reuses high-quality teacher text features, simplifying student model training.
  • Framework is based on PromptSRC, MaPLe, Co-CoOp, and CoOp repositories.

Maintenance & Community

  • The primary contact is Zheng Li (zhengli97[at]qq.com).
  • Issues can be submitted on GitHub.
  • A Zhihu article is available for Chinese speakers.
  • The project is associated with Nankai University and Ant Group.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. The underlying libraries (PromptSRC, MaPLe, Co-CoOp, CoOp) may have their own licenses.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The Stanfordcars dataset link may be broken; the dataset is provided in GitHub releases.
  • Accuracy of self-trained teacher models can vary and requires careful validation against provided tables.
  • The README does not specify the exact Python version or other core dependencies beyond PyTorch.
Health Check
Last commit

4 days ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
30 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.