PromptKD by zhengli97

Research paper for unsupervised prompt distillation in vision-language models

Created 1 year ago

344 stars

Top 80.4% on SourcePulse

Project Summary

PromptKD is an unsupervised framework for distilling knowledge from large Vision-Language Models (VLMs) to smaller target models using unlabeled domain images. It is designed for researchers and practitioners working with VLMs who need to adapt powerful models to specific domains efficiently without requiring labeled data for the target domain. The primary benefit is achieving strong performance on downstream tasks with a lightweight student model by leveraging a pre-trained, high-quality teacher model.

How It Works

PromptKD employs a novel two-stage unsupervised prompt distillation approach. First, it utilizes a pre-trained, large CLIP teacher model to generate soft labels for unlabeled domain images. Second, it distills this knowledge into a lightweight student model by having the student mimic the teacher's output, specifically by reusing the teacher's high-quality text features as shared class vectors. This method avoids training a separate text encoder for the student, making the process more efficient and effective.

Quick Start & Requirements

Installation: Install the Dassl.pytorch library. Instructions are in INSTALL.md.
Prerequisites:
- PyTorch.
- Pre-trained CLIP models (ViT-L/14 or ViT-B/16) from OpenAI are recommended.
- Pre-trained teacher CLIP models are available via Baidu Yun, TeraBox, and Google Cloud.
- Dataset preparation instructions are in DATASETS.md.
Setup: Requires downloading CLIP model weights and preparing datasets. Training a teacher model is optional but recommended for optimal results.
Links: Paper, Project Page, Poster

Highlighted Details

Outperforms existing prompt learning methods on 11 diverse recognition datasets.
Demonstrates strong generalization ability in base-to-novel and cross-dataset evaluations.
Reuses high-quality teacher text features, simplifying student model training.
Framework is based on PromptSRC, MaPLe, Co-CoOp, and CoOp repositories.

Maintenance & Community

The primary contact is Zheng Li (zhengli97[at]qq.com).
Issues can be submitted on GitHub.
A Zhihu article is available for Chinese speakers.
The project is associated with Nankai University and Ant Group.

Licensing & Compatibility

The repository does not explicitly state a license in the README. The underlying libraries (PromptSRC, MaPLe, Co-CoOp, CoOp) may have their own licenses.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The Stanfordcars dataset link may be broken; the dataset is provided in GitHub releases.
Accuracy of self-trained teacher models can vary and requires careful validation against provided tables.
The README does not specify the exact Python version or other core dependencies beyond PyTorch.

Health Check

Last Commit

4 weeks ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

1

Star History

4 stars in the last 30 days

Explore Similar Projects

LaCLIP by LijieFan

Research paper code and models for improving CLIP training via language rewrites

Created 2 years ago

Updated 2 years ago

PromptSRC by muzairkhattak

Vision-language prompt learning research paper

Created 2 years ago

Updated 2 years ago

MiniRBT by iflytek

Small, distilled Chinese pre-trained language models

Created 3 years ago

Updated 6 months ago

e4t-diffusion by mkshing

Diffusion implementation for fast text-to-image model personalization

Created 2 years ago

Updated 2 years ago

Starred by

Jiayi Pan

Jiayi Pan(Author of SWE-Gym; MTS at xAI),

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and

1 more.

METER by zdou0830

Multimodal framework for vision-and-language transformer research

Created 4 years ago

Updated 3 years ago

Awesome-CV-Foundational-Models by awaisrauf

Vision-language survey paper with curated list of foundational CV models

Created 2 years ago

Updated 1 year ago

Awesome-Prompting-on-Vision-Language-Model by JindongGu

Survey paper for vision-language model prompt engineering

Created 2 years ago

Updated 9 months ago

Starred by

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen).

Awesome-CLIP by yzhuoning

CLIP resources list

Created 4 years ago

Updated 1 year ago

multimodal-prompt-learning by muzairkhattak

Research paper on multimodal prompt learning for vision-language models

Created 3 years ago

Updated 2 years ago

cliport by cliport

Robotic manipulation via imitation learning using language-conditioned policies

Created 4 years ago

Updated 2 years ago

VIMA by vimalabs

Robot manipulation via multimodal prompts (ICML'23 paper)

Created 3 years ago

Updated 1 year ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

10 more.

open_flamingo by mlfoundations

Open-source framework for training large multimodal models

Created 3 years ago

Updated 1 year ago

Feedback? Help us improve.