CaFo  by OpenGVLab

Few-shot learner research paper using cascaded foundation models

Created 2 years ago
376 stars

Top 75.5% on SourcePulse

GitHubView on GitHub
Project Summary

CaFo addresses the challenge of few-shot image classification by cascading multiple foundation models. It targets researchers and practitioners seeking to improve classification performance with limited labeled data, leveraging diverse pre-trained models for state-of-the-art results.

How It Works

CaFo employs a "Prompt, Generate, then Cache" strategy. It uses GPT-3 to prompt CLIP for richer linguistic understanding, DALL-E to generate synthetic data for augmentation, and a learnable cache model to adaptively combine predictions from CLIP and DINO. This approach aims to unify and maximize the potential of various pre-training paradigms for enhanced few-shot learning.

Quick Start & Requirements

  • Install: Clone the repository, create a conda environment (conda create -n cafo python=3.7), activate it (conda activate cafo), and install dependencies (pip install -r requirements.txt). PyTorch and torchvision with matching CUDA versions are required (conda install pytorch torchvision cudatoolkit).
  • Data: Download datasets and pre-trained weights for CLIP (automatic), DINO (provided link), and DALL-E/Stable Diffusion generated images (provided links). Organize data as specified in DATASET.md.
  • Configs: Modify .yaml files in configs/ for dataset-specific settings and hyperparameters.
  • Run: CUDA_VISIBLE_DEVICES=0 python main_imagenet.py --config configs/imagenet/16shot.yaml (or main.py for other datasets).

Highlighted Details

  • Achieves state-of-the-art few-shot classification performance across 11 datasets.
  • Integrates CLIP, DINO, DALL-E, and GPT-3 for comprehensive knowledge utilization.
  • Employs a learnable cache mechanism for adaptive prediction blending.
  • Supports data augmentation via synthetic image generation.

Maintenance & Community

Licensing & Compatibility

  • The repository's license is not explicitly stated in the README. However, it benefits from and is based on Tip-Adapter, CLIP, DINO, DALL-E, and CuPL, which have their own licenses. Users should verify compatibility for commercial or closed-source use.

Limitations & Caveats

  • Requires specific versions of PyTorch and CUDA.
  • Downloading and organizing pre-trained models and datasets is a manual step.
  • The README does not explicitly state the project's license, which may impact commercial adoption.
Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Jiaming Song Jiaming Song(Chief Scientist at Luma AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

Otter by EvolvingLMMs-Lab

0.0%
3k
Multimodal model for improved instruction following and in-context learning
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.