CaFo  by OpenGVLab

Few-shot learner research paper using cascaded foundation models

created 2 years ago
374 stars

Top 76.9% on sourcepulse

GitHubView on GitHub
Project Summary

CaFo addresses the challenge of few-shot image classification by cascading multiple foundation models. It targets researchers and practitioners seeking to improve classification performance with limited labeled data, leveraging diverse pre-trained models for state-of-the-art results.

How It Works

CaFo employs a "Prompt, Generate, then Cache" strategy. It uses GPT-3 to prompt CLIP for richer linguistic understanding, DALL-E to generate synthetic data for augmentation, and a learnable cache model to adaptively combine predictions from CLIP and DINO. This approach aims to unify and maximize the potential of various pre-training paradigms for enhanced few-shot learning.

Quick Start & Requirements

  • Install: Clone the repository, create a conda environment (conda create -n cafo python=3.7), activate it (conda activate cafo), and install dependencies (pip install -r requirements.txt). PyTorch and torchvision with matching CUDA versions are required (conda install pytorch torchvision cudatoolkit).
  • Data: Download datasets and pre-trained weights for CLIP (automatic), DINO (provided link), and DALL-E/Stable Diffusion generated images (provided links). Organize data as specified in DATASET.md.
  • Configs: Modify .yaml files in configs/ for dataset-specific settings and hyperparameters.
  • Run: CUDA_VISIBLE_DEVICES=0 python main_imagenet.py --config configs/imagenet/16shot.yaml (or main.py for other datasets).

Highlighted Details

  • Achieves state-of-the-art few-shot classification performance across 11 datasets.
  • Integrates CLIP, DINO, DALL-E, and GPT-3 for comprehensive knowledge utilization.
  • Employs a learnable cache mechanism for adaptive prediction blending.
  • Supports data augmentation via synthetic image generation.

Maintenance & Community

Licensing & Compatibility

  • The repository's license is not explicitly stated in the README. However, it benefits from and is based on Tip-Adapter, CLIP, DINO, DALL-E, and CuPL, which have their own licenses. Users should verify compatibility for commercial or closed-source use.

Limitations & Caveats

  • Requires specific versions of PyTorch and CUDA.
  • Downloading and organizing pre-trained models and datasets is a manual step.
  • The README does not explicitly state the project's license, which may impact commercial adoption.
Health Check
Last commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.