Few-shot learner research paper using cascaded foundation models
Top 76.9% on sourcepulse
CaFo addresses the challenge of few-shot image classification by cascading multiple foundation models. It targets researchers and practitioners seeking to improve classification performance with limited labeled data, leveraging diverse pre-trained models for state-of-the-art results.
How It Works
CaFo employs a "Prompt, Generate, then Cache" strategy. It uses GPT-3 to prompt CLIP for richer linguistic understanding, DALL-E to generate synthetic data for augmentation, and a learnable cache model to adaptively combine predictions from CLIP and DINO. This approach aims to unify and maximize the potential of various pre-training paradigms for enhanced few-shot learning.
Quick Start & Requirements
conda create -n cafo python=3.7
), activate it (conda activate cafo
), and install dependencies (pip install -r requirements.txt
). PyTorch and torchvision with matching CUDA versions are required (conda install pytorch torchvision cudatoolkit
).DATASET.md
..yaml
files in configs/
for dataset-specific settings and hyperparameters.CUDA_VISIBLE_DEVICES=0 python main_imagenet.py --config configs/imagenet/16shot.yaml
(or main.py
for other datasets).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
2 years ago
1 week